Convert Unicode to ASCII

Convert Unicode to ASCII with transliteration (é → e, ñ → n), replace, or strip modes. Per-character breakdown. Free, offline, client-side, secure.

Convert Unicode text to ASCII via transliteration (the smart way) - café becomes cafe, naïve becomes naive, ñoño becomes nono. Four modes (Transliterate / Replace / Strip / Mixed). Per-character grid shows exactly what happened to each character.

How to Use Convert Unicode to ASCII

  1. Paste your Unicode text. Anything goes - accented Latin (café), Greek (Ελληνικά), Cyrillic (Привет), CJK (日本語), emoji (😀), mathematical symbols (). Multiple lines work as batch (one input per line in the output TSV).
  2. Choose a conversion mode. Transliterate (default, smart) - uses Unicode NFD normalization + combining-mark strip to convert ée, ñn, üu. Replace - every non-ASCII char becomes a placeholder (default ?, customizable). Strip - non-ASCII characters disappear entirely. Mixed - transliterate when possible, fall back to replace or strip for chars without a Latin form (like Greek, Cyrillic, emoji).
  3. Customize the replace character. Visible when mode is Replace or Mixed-with-Replace. Default is ? but you can use any 1-3 char ASCII string. Common choices: _, -, [?].
  4. Read the per-character grid. One row per input character showing: the character (with substitute glyphs for whitespace - · for space, for newline), its codepoint (U+00E9), the output character (or "(removed)" if stripped), and the action taken (kept / transliterated / replaced / stripped). Color-coded so you can scan for problem characters quickly.
  5. Special exception handling. Some characters don't decompose to ASCII via NFD but have well-known transliterations: ßss, ÆAE, æae, Œ/œOE/oe, Ø/øO/o, Ł/łL/l, ©(c), --, "". About 25 exceptions covering the most common Unicode chars.
  6. Batch mode (multiple lines). Each input line becomes one row in a TSV with columns: Input / Output / ASCII / Transliterated / Replaced / Stripped. Useful for processing a column of names from a database export where you want stats per row.
  7. Read the stats. Total chars in vs out (transliterated chars like ßss add to output length), how many were pure ASCII (kept as-is), how many transliterated, how many replaced, how many stripped. Quick way to see whether your data is mostly Latin (high transliterated count) or has many non-Latin scripts (high stripped/replaced count).

Frequently Asked Questions

How does transliteration actually work?

For each non-ASCII character, the tool calls char.normalize('NFD') which decomposes precomposed characters into base + combining marks (é = e + ◌́). Then it strips the combining marks (Unicode range U+0300-U+036F) and keeps only the base ASCII characters. Result: ée, ñn, üu, çc. Plus 25+ special-case exceptions for characters that don’t decompose cleanly (ßss, æae).

What about non-Latin scripts like Greek, Cyrillic, or CJK?

They have no NFD decomposition to Latin (because they’re not Latin), so transliteration produces empty output for them. In pure Transliterate mode, these characters are silently stripped. In Mixed mode, you choose the fallback: strip (default) or replace with the placeholder char. So café Привет with Transliterate → cafe (note trailing space where Cyrillic was). With Mixed + replace → cafe ??????.

Which characters get special exceptions?

About 25 characters that have well-known transliterations but don’t decompose via NFD: ßss, Æ/æAE/ae, Œ/œOE/oe, Ø/øO/o, Ð/ðD/d, Þ/þTh/th, Ł/łL/l. Currency: EUR, £GBP, ¥YEN. Symbols: ©(c), ®(r), (tm). Punctuation: em dashes to -, ellipsis to ..., curly quotes to straight quotes, angle quotes to <</>>.

What does the “Strip” mode do for ASCII chars?

Strip only removes non-ASCII characters. Pure ASCII characters (a-z, A-Z, 0-9, punctuation, spaces) are always preserved in every mode. So Hello é World with Strip → Hello World (note two spaces where the é was). With Replace → Hello ? World. With Transliterate → Hello e World.

Why might transliteration lose information?

Some languages encode meaning in diacritics. Spanish año (year) and ano (anus) differ only by the tilde – transliteration loses that distinction. Same for many Slavic, Vietnamese, and Yoruba words. The tool isn’t trying to preserve linguistic meaning, just to produce ASCII output for systems that can’t handle Unicode. Use thoughtfully – if your data is human-readable text in a language sensitive to diacritics, the output may be ambiguous to native speakers.

Can I configure the replacement character?

Yes. Default is ? but you can type any 1-3 ASCII characters. Common alternatives: _ (underscore for filenames), - (dash), [?] (bracketed to make replacements scannable), # (hash mark). The replacement applies in Replace mode and in Mixed mode when fallback is “Replace”.

What’s the input size limit?

200,000 characters. Beyond that, the per-character grid would render slowly (each row is a DOM node) and the conversion itself starts to lag. The grid display caps at 256 rows with a “… N more characters” note, but the conversion processes the full input – only the grid display is truncated.

Does it preserve text length?

Depends on mode. Replace preserves length exactly (1 char in = 1 char out). Strip shortens – non-ASCII chars disappear. Transliterate usually preserves length but some exceptions expand: ßss adds a character, ©(c) adds two. Stats show “outputLength” so you can verify before using the result somewhere length-sensitive.

Is my text uploaded anywhere?

No. All NFD normalization, combining mark stripping, exception lookup, and TSV emission run in your browser. Open DevTools → Network and confirm zero requests fire – even when you Convert or Download. Safe for sensitive text, personally identifying data, or anything you’d rather not send to a third-party converter.

Does it work offline?

Yes. Total bundle is about 18 KB. Load once, disconnect, keep using. The Unicode normalization is part of the JavaScript standard library (built into the browser) – no remote dependencies. Useful for processing legacy data on airgapped systems or in offline environments.