Convert Hex to UTF-8

Decode hex to UTF-8 with per-character structural breakdown - shows 1/2/3/4-byte patterns, code points (U+XXXX), and chars. Free, offline, client-side.

Decode hex bytes to UTF-8 text, with an optional structural breakdown showing how each character is encoded - 1-byte ASCII, 2-byte Latin/Greek/Cyrillic, 3-byte BMP, or 4-byte supplementary plane. Reveals the Unicode code point (U+XXXX) for each character.

Enter hex bytes to decode.

How to Use Convert Hex to UTF-8

  1. Paste your hex bytes in any format - continuous (C3A9), space-separated, with 0x prefixes, in C-array braces ({0x48, 0x65}), or any combination. Separators are stripped automatically.
  2. Pick an output mode. "Text only" shows just the decoded string. "Text + structural breakdown" adds a per-character panel showing how each character is encoded in UTF-8 - perfect for learning the format or debugging encoding bugs.
  3. Toggle Strict if you want malformed byte sequences to fail instead of being replaced with U+FFFD. Lenient (default) substitutes replacements and keeps going.
  4. Press Convert (or Ctrl+Enter / Cmd+Enter). Auto-convert fires 200 ms after each keystroke so results update as you type.
  5. Read the stats line: total bytes, chars, and the distribution by UTF-8 byte length (e.g., "5 bytes ยท 5 chars ยท 5ร— 1-byte" for pure ASCII; "4 bytes ยท 1 char ยท 1ร— 4-byte" for an emoji).
  6. Inspect the breakdown: each entry shows the hex bytes that make up the character, how many bytes were used, the Unicode code point (U+XXXX), and the rendered char.
  7. Copy or Download: Copy puts the decoded text on your clipboard; Download saves a .txt file (UTF-8 encoded).

Frequently Asked Questions

What does the structural breakdown show?

For each character in the decoded text, the breakdown panel shows: the raw hex bytes that make it up (e.g., C3 A9), the byte count (1, 2, 3, or 4), the Unicode code point (U+00E9), and the rendered character (รฉ). This exposes the UTF-8 encoding structure – useful for learning, debugging encoding bugs, or verifying that multi-byte characters are being parsed correctly.

What are UTF-8’s 1/2/3/4-byte patterns?

UTF-8 is a variable-width encoding. Bytes 0x00-0x7F are 1-byte (ASCII, identical to 7-bit). 2-byte sequences start with 110xxxxx and cover U+0080-U+07FF (Latin extended, Greek, Cyrillic, Hebrew, Arabic). 3-byte sequences start with 1110xxxx and cover U+0800-U+FFFF (Basic Multilingual Plane – most living scripts). 4-byte sequences start with 11110xxx and cover U+10000-U+10FFFF (supplementary plane – emoji, rare scripts).

How is this different from “Hex to String” or “Hex to Text”?

All three decode hex to text. Hex to UTF-8 (this tool) is UTF-8-focused and adds the per-character structural breakdown – best for understanding how UTF-8 works. Hex to String offers a full encoding selector (UTF-8, UTF-16, Latin-1, ASCII). Hex to Text focuses on parsing hex-dump output (xxd, hexdump -C, Wireshark). Pick whichever matches your use case.

What’s a “code point” and why show it?

A Unicode code point is the unique integer assigned to a character – “A” is U+0041, “รฉ” is U+00E9, “โ˜ƒ” is U+2603, “๐Ÿ˜€” is U+1F600. UTF-8 is one of several ways to represent these code points as bytes; showing the code point separately from the bytes helps you cross-reference with Unicode charts (unicode.org/charts) or debug character-substitution issues.

How does the tool detect invalid UTF-8?

The structural decoder validates each sequence against UTF-8’s rules: correct lead byte, correct number of continuation bytes (starting with 10xxxxxx), no overlong encodings (e.g., encoding U+0041 as C1 81 instead of 41), no surrogate-range code points (U+D800-U+DFFF are invalid in UTF-8). Anything that fails these checks gets flagged with a red border and a replacement marker.

Why do emoji look like 4 bytes?

Most common emoji (๐Ÿ˜€ U+1F600, ๐ŸŽ‰ U+1F389, ๐Ÿ‘ U+1F44D) are in the Unicode supplementary plane (U+10000 and above), which UTF-8 encodes as 4 bytes. So F0 9F 98 80 is a single emoji character – 4 bytes on the wire, 1 character on screen. Some older symbols (โ˜ƒ snowman U+2603) are in the Basic Multilingual Plane and only need 3 bytes.

What’s Strict mode for?

Validating input integrity. Strict mode tells the browser’s `TextDecoder` to throw on malformed UTF-8 instead of silently substituting U+FFFD. Use it when you need to confirm data is well-formed (e.g., verifying a file’s encoding before storing it). Lenient mode is the default because it’s more useful for best-effort decoding of potentially-corrupt data.

What happens to odd-length hex?

Error: “hex must have even number of digits”. Each byte requires 2 hex digits; an odd total means at least one byte is cut off. Check your input for missing digits or mid-byte truncation.

Is my hex uploaded anywhere?

No. All decoding runs client-side via the browser’s native `TextDecoder` API. No network requests fire during conversion, no server stores or logs your data. You can verify with your browser’s Network tab. The tool works offline after the initial page load.

How do I go from UTF-8 text back to hex?

Use our “Text to Hex” or “UTF-8 to Hex” converter. In code: [...new TextEncoder().encode(str)].map(b => b.toString(16).padStart(2, '0')).join(' '). The round-trip is lossless for any valid UTF-8 text.