Convert UTF-8 to ASCII
Decode UTF-8 byte representations (xHH, uHHHH, %HH, hex, decimal, HTML entities) to text with ASCII folding (transliterate/replace/strip). Free, client-side.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Decode UTF-8 byte representations (\xHH, \uHHHH, %HH, raw hex / decimal, HTML numeric entities) into proper text, then choose how to handle non-ASCII characters: keep them, transliterate (café → cafe), replace with ?, or strip. Decoding uses the browser's TextDecoder, so multi-byte characters decode correctly.
Per-character breakdown
How to Use Convert UTF-8 to ASCII
- Paste your UTF-8-encoded byte representation. Format auto-detected - supports
xHH,uHHHH,%HH, raw hex (e.g.,48 65 6C), raw decimal (72 101 108), and HTML numeric entities (HorH). - Pick how to handle non-ASCII characters after decoding. Transliterate strips diacritics via Unicode NFD normalization (
café → cafe) - best for fuzzy ASCII output. Keep leaves them as Unicode. Replace uses?. Strip removes them entirely. - The decoded bytes are run through
TextDecoder('utf-8', {fatal: true})- so axC3xA9sequence correctly becomesé(one codepoint from two bytes), noté(the old version's mojibake). - Read the per-character grid: every output character is labelled ASCII / non-ASCII along with its codepoint and the fold-mode result. Useful for confirming a transliteration matches your expectations.
- Swap to encode plain text back to UTF-8 bytes in any of 6 output formats (
xHH/uHHHH/%HH/ hex / decimal / HTML entities for non-ASCII only).
Frequently Asked Questions
How does the tool handle multi-byte UTF-8 sequences?
They are combined before any ASCII folding happens. A sequence like xC3xA9 (the two bytes for é) goes through TextDecoder('utf-8'), which merges multi-byte sequences into single Unicode codepoints – so é is treated as one character, not the mojibake pair é.
What does “transliterate” actually do?
Unicode NFD normalization decomposes accented characters into base + combining marks (é = e + U+0301 acute), then we drop everything in the combining-mark range (U+0300-U+036F). Result: café → cafe, naïve → naive, résumé → resume. Doesn’t transliterate non-Latin scripts (Cyrillic, CJK, Arabic) – those still need ? via the replace fallback.
What input formats are auto-detected?
Six: xHH C-style escapes, uHHHH JS/Python escapes, %HH URL-encoded, raw space-separated hex, raw space-separated decimal, and HTML numeric entities (H decimal or H hex). Detection runs in that priority order: the first regex that matches wins. So if your input mixes formats, the first detected wins and the rest gets parsed by that format’s rules – which may fail.
What if my UTF-8 is truncated?
The decoder runs in fatal mode and throws – for example a lone xC3 (high byte expecting a continuation) errors with “Invalid UTF-8 byte sequence”. The old version would silently produce à (Latin-1 byte 0xC3) which masks the underlying corruption.
Why is uHHHH treated as a codepoint instead of bytes?
Because u escapes are by definition Unicode codepoints (16-bit ranges per JS/Python convention), not bytes. The tool converts each uHHHH to the corresponding Unicode character, then encodes the resulting text as UTF-8 internally before ASCII-folding. é → é → 2 UTF-8 bytes → fold per chosen mode.
How does HTML entity decoding work?
Both &#NN; (decimal codepoint) and &#xHH; (hex codepoint) are extracted via regex and translated codepoint-by-codepoint. Non-numeric named entities like © are NOT supported here – use the dedicated HTML entity decoder for those.
What’s the difference between “replace” and “strip”?
Both remove non-ASCII content, but replace preserves character positions (café → caf?, 4 chars) while strip shortens output (café → caf, 3 chars). Replace is safer when downstream code depends on character positions.
Is my text uploaded?
No. Everything runs in the browser. About 22 KB.
What’s the input cap?
200,000 characters. Long inputs throw an explicit error instead of freezing the tab.
Can I encode the result back to UTF-8 bytes?
Yes – click Swap. Choose any of 6 output formats including xHH, uHHHH, URL-encoded, raw hex, raw decimal, and HTML-entity (which leaves ASCII alone and only entity-encodes non-ASCII for safe HTML embedding).
Related Tools
ASCII to UTF-8 Bytes Converter →
Encode ASCII to UTF-8 bytes - hex (x48), decimal, percent (), binary, or octal,…
ASCII Case Converter →
Transform any text into Uppercase, Lowercase, Title Case, or Sentence Case safely formatting paragraphs…
Convert Arbitrary Base to ASCII →
Decode numeric strings in any base (2-36) to ASCII text - space, comma, or…
Convert ASCII to Arbitrary Base →
Convert ASCII to Arbitrary Base text to numeric strings in any base (2-36) -…
Convert ASCII to Base6 →
Encode text to Base64 - Unicode, URL-safe variant, optional 76-char line wrap. Free, client-side,…
Convert ASCII to Bytes →
Convert text to UTF-8 byte values - decimal, hex, binary, octal, with JSON or…
ASCII to Decimal Converter →
Convert text to decimal ASCII codes or Unicode code points - strict, byte, UTF-8,…
ASCII to Hexadecimal Converter →
Convert ASCII to hexadecimal - code points or UTF-8 bytes, uppercase, 0x-prefix, JSON-array output.…
ASCII to HTML Entities Converter →
ASCII to HTML as entities - safe XSS escape, numeric decimal, or numeric hex.…
ASCII to Image Converter →
Render ASCII to Image art as a PNG - choose font size, family, colours,…
ASCII to Lowercase Converter →
Convert ASCII to lowercase - locale-aware for Turkish, Azerbaijani, and Lithuanian. Free, client-side, instant,…
ASCII to Morse Code Converter →
Encode ASCII to Morse Code code with Web Audio playback, adjustable WPM, and copy-ready…