Home Tools Blog About

Convert Unicode to UTF-8

In short

Convert Unicode to UTF-8 bytes (hex/decimal/binary), bidirectional with BOM, prefix, and per-character byte breakdown. Free, offline, client-side, secure.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Convert Unicode text to UTF-8 bytes with a per-character breakdown showing how each character uses 1 to 4 bytes. Choose hex, decimal, or binary output; toggle the UTF-8 BOM (EF BB BF); swap to decode bytes back to text with strict UTF-8 validation.

Per-character byte breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert Unicode to UTF-8

  1. Paste Unicode text - ASCII, accented Latin, CJK, or emoji all work.
  2. Pick output format: hex (most common), decimal, or 8-bit binary.
  3. Set a hex prefix if you need C-style (x41) or web-style (0x41) output.
  4. Toggle the UTF-8 BOM if your downstream consumer needs it (most don't - it's optional and discouraged for new content).
  5. Read the grid - each character shows its byte width (1-4) so you can see why 🌍 is 4 bytes and A is 1.
  6. Swap to decode UTF-8 bytes back to text. The decoder is strict - invalid sequences error out instead of producing replacement characters.

Frequently Asked Questions

How does UTF-8 work?

Variable-width encoding: codepoints U+0000-U+007F use 1 byte (ASCII-compatible), U+0080-U+07FF use 2 bytes, U+0800-U+FFFF use 3 bytes, U+10000-U+10FFFF use 4 bytes. The first byte’s high bits signal the width (0xxxxxxx / 110xxxxx / 1110xxxx / 11110xxx), and continuation bytes start with 10xxxxxx.

Why is 🌍 four bytes but A is one?

UTF-8 is variable-length by codepoint. A (U+0041) fits in 7 bits → 1 byte. 🌍 (U+1F30D) needs 17 bits → 4 bytes. The grid shows each character’s exact width.

What’s the UTF-8 BOM?

Three magic bytes EF BB BF (U+FEFF encoded in UTF-8) that mark a file as UTF-8. Microsoft tools often add it; Unix tools mostly don’t. The Unicode standard discourages it for UTF-8 because UTF-8 has no byte-order ambiguity to resolve. If your consumer treats it as content (rare bug), strip it.

Why does your decoder reject some byte sequences?

The tool uses TextDecoder in fatal mode – invalid UTF-8 throws an error rather than substituting U+FFFD replacement characters. Common rejections: truncated sequences (4-byte char missing its tail), overlong encodings, lone continuation bytes, and bytes encoding surrogates U+D800-U+DFFF (illegal in UTF-8).

Hex, decimal, or binary – which should I use?

Hex (e.g., F0 9F 8C 8D) is the de-facto standard in docs, debuggers, and network captures – compact and aligned to byte boundaries. Decimal (e.g., 240 159 140 141) matches byte[] literals in some languages. Binary (e.g., 11110000…) makes the bit-level UTF-8 encoding pattern visible.

What about the 0x and x prefixes?

0x matches most language hex literals (JS, Python, C, Go). x matches C/C++/Python/Rust string-byte escapes. Pick “none” for plain space-separated bytes you’d paste into a network analyzer.

Is my text uploaded?

No. Everything runs in your browser via TextEncoder / TextDecoder.

Does it work offline?

Yes. The whole tool weighs about 18 KB, so once the page has loaded it runs without any network connection – every conversion happens locally in JavaScript on your device.

What’s the input cap?

200,000 characters per encode. Decode is similarly capped. The tool runs on the main thread (no Web Worker), so the cap protects against accidental tab-freezes on huge inputs.

UTF-8 vs UTF-16 vs UTF-32 – when to use which?

UTF-8 dominates the web, files, and APIs because it’s ASCII-backward-compatible and space-efficient for Latin text. UTF-16 is JavaScript’s internal string format and is used by Windows/Java APIs. UTF-32 is mostly used internally where fixed-width indexing matters. For storage or transmission, UTF-8 is almost always the right choice today.

Keep going

Related Tools

All Unicode tools →
Share

Embed this tool

Add this free tool to your website. Copy and paste the code: