Home Tools Blog About

Convert UTF-8 to ASCII

In short

Decode UTF-8 byte representations (xHH, uHHHH, %HH, hex, decimal, HTML entities) to text with ASCII folding (transliterate/replace/strip). Free, client-side.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Decode UTF-8 byte representations (\xHH, \uHHHH, %HH, raw hex / decimal, HTML numeric entities) into proper text, then choose how to handle non-ASCII characters: keep them, transliterate (café → cafe), replace with ?, or strip. Decoding uses the browser's TextDecoder, so multi-byte characters decode correctly.

Per-character breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert UTF-8 to ASCII

  1. Paste your UTF-8-encoded byte representation. Format auto-detected - supports xHH, uHHHH, %HH, raw hex (e.g., 48 65 6C), raw decimal (72 101 108), and HTML numeric entities (H or H).
  2. Pick how to handle non-ASCII characters after decoding. Transliterate strips diacritics via Unicode NFD normalization (café → cafe) - best for fuzzy ASCII output. Keep leaves them as Unicode. Replace uses ?. Strip removes them entirely.
  3. The decoded bytes are run through TextDecoder('utf-8', {fatal: true}) - so a xC3xA9 sequence correctly becomes é (one codepoint from two bytes), not é (the old version's mojibake).
  4. Read the per-character grid: every output character is labelled ASCII / non-ASCII along with its codepoint and the fold-mode result. Useful for confirming a transliteration matches your expectations.
  5. Swap to encode plain text back to UTF-8 bytes in any of 6 output formats (xHH / uHHHH / %HH / hex / decimal / HTML entities for non-ASCII only).

Frequently Asked Questions

How does the tool handle multi-byte UTF-8 sequences?

They are combined before any ASCII folding happens. A sequence like xC3xA9 (the two bytes for é) goes through TextDecoder('utf-8'), which merges multi-byte sequences into single Unicode codepoints – so é is treated as one character, not the mojibake pair é.

What does “transliterate” actually do?

Unicode NFD normalization decomposes accented characters into base + combining marks (é = e + U+0301 acute), then we drop everything in the combining-mark range (U+0300-U+036F). Result: cafécafe, naïvenaive, résuméresume. Doesn’t transliterate non-Latin scripts (Cyrillic, CJK, Arabic) – those still need ? via the replace fallback.

What input formats are auto-detected?

Six: xHH C-style escapes, uHHHH JS/Python escapes, %HH URL-encoded, raw space-separated hex, raw space-separated decimal, and HTML numeric entities (H decimal or H hex). Detection runs in that priority order: the first regex that matches wins. So if your input mixes formats, the first detected wins and the rest gets parsed by that format’s rules – which may fail.

What if my UTF-8 is truncated?

The decoder runs in fatal mode and throws – for example a lone xC3 (high byte expecting a continuation) errors with “Invalid UTF-8 byte sequence”. The old version would silently produce à (Latin-1 byte 0xC3) which masks the underlying corruption.

Why is uHHHH treated as a codepoint instead of bytes?

Because u escapes are by definition Unicode codepoints (16-bit ranges per JS/Python convention), not bytes. The tool converts each uHHHH to the corresponding Unicode character, then encodes the resulting text as UTF-8 internally before ASCII-folding. éé → 2 UTF-8 bytes → fold per chosen mode.

How does HTML entity decoding work?

Both &#NN; (decimal codepoint) and &#xHH; (hex codepoint) are extracted via regex and translated codepoint-by-codepoint. Non-numeric named entities like © are NOT supported here – use the dedicated HTML entity decoder for those.

What’s the difference between “replace” and “strip”?

Both remove non-ASCII content, but replace preserves character positions (cafécaf?, 4 chars) while strip shortens output (cafécaf, 3 chars). Replace is safer when downstream code depends on character positions.

Is my text uploaded?

No. Everything runs in the browser. About 22 KB.

What’s the input cap?

200,000 characters. Long inputs throw an explicit error instead of freezing the tab.

Can I encode the result back to UTF-8 bytes?

Yes – click Swap. Choose any of 6 output formats including xHH, uHHHH, URL-encoded, raw hex, raw decimal, and HTML-entity (which leaves ASCII alone and only entity-encodes non-ASCII for safe HTML embedding).

Keep going

Related Tools

All Ascii tools →
Share

Embed this tool

Add this free tool to your website. Copy and paste the code: