Convert UTF-8 to Hex

Convert UTF-8 text to hexadecimal (5 formats, case, grouping). Bidirectional. Emoji-safe. Free, client-side, instant, secure.

Convert UTF-8 text to hexadecimal bytes in 5 formats - continuous, space-separated, colon-separated, 0x-prefixed, and C-style array. Bidirectional, multi-byte safe. Note: UTF-8 has no endianness, so there is no byte-order option to worry about.

Per-character breakdown

Type to begin.

How to Use Convert UTF-8 to Hex

  1. Paste UTF-8 text - ASCII, accented, CJK, emoji.
  2. Pick a format: space is human-readable; continuous is compact; colon matches MAC-address style; 0x-prefixed matches most language hex literals; C-style array is ready to paste into source code.
  3. Choose uppercase or lowercase for A-F.
  4. Optionally group bytes: 2 joins bytes into 4-digit words, 4 into 8-digit dwords, etc.
  5. Swap to decode: the parser strips braces, commas, 0x prefixes, colons, and whitespace; what's left must be hex digits. Bytes go through fatal-mode TextDecoder so invalid UTF-8 throws.

Frequently Asked Questions

How does UTF-8 to hex work?

Each UTF-8 byte (0-255) becomes a 2-character hex token (00-FF). ASCII characters get 1 byte → 2 hex chars; accented Latin 2 bytes → 4 hex chars; CJK 3 bytes → 6 hex chars; emoji 4 bytes → 8 hex chars.

Why has the “Little-endian” option been removed?

Because it was incorrect. UTF-8 has no endianness – it’s a byte stream by spec design (see RFC 3629). Endianness only matters for fixed-width multi-byte units like UTF-16 or UTF-32; UTF-8’s single-byte stream eliminates the problem by design.

What’s the difference between the 5 output formats?

Continuous: no separator (48656C6C6F) – compact for embedding. Space: human-readable (48 65 6C 6C 6F). Colon: MAC-address style (48:65:6C:6C:6F). 0x-prefixed: language-literal style (0x48 0x65 0x6C 0x6C 0x6F). C-array: ready to paste ({0x48, 0x65, 0x6C, 0x6C, 0x6F}).

How does the reverse decoder handle different formats?

It strips everything but hex digits – braces, commas, 0x prefixes, colons, spaces, newlines. What remains must be an even number of hex digits, which it groups into bytes and feeds to fatal-mode TextDecoder.

Why does emoji work here when some other tools fail?

This tool uses TextEncoder to extract the UTF-8 byte sequence first. Simple implementations that loop over charCodeAt see UTF-16 code units (with surrogate pairs for emoji) instead of UTF-8 bytes, producing wrong output.

What does “group by N” do?

Joins consecutive bytes into N-byte clusters before applying the separator. For example, “Hello” space-separated with group-by-2: 4865 6C6C 6F (4 chars + 4 chars + 2 chars). Useful for visualising 16-bit words or 32-bit dwords in protocol dumps.

Are invalid UTF-8 bytes silently fixed?

No. The decoder uses fatal mode – invalid sequences (lone continuations, truncated leaders, encoded surrogates, overlong forms) throw an explicit error rather than producing U+FFFD replacement characters.

Is text uploaded anywhere?

No. TextEncoder / TextDecoder run in the browser.

What’s the input cap?

200,000 characters. Hex output grows to 2× the byte count.

How does this compare to the UTF-8 to Bytes tool?

The Bytes tool exposes hex AND decimal AND binary plus optional prefixes. This Hex tool is hex-specific and adds 5 output formats (continuous / space / colon / 0x / C-array) for direct paste into different contexts.