Convert Unicode to UTF-8
Convert Unicode to UTF-8 bytes (hex/decimal/binary), bidirectional with BOM, prefix, and per-character byte breakdown. Free, offline, client-side, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert Unicode text to UTF-8 bytes with a per-character breakdown showing how each character uses 1 to 4 bytes. Choose hex, decimal, or binary output; toggle the UTF-8 BOM (EF BB BF); swap to decode bytes back to text with strict UTF-8 validation.
Per-character byte breakdown
How to Use Convert Unicode to UTF-8
- Paste Unicode text - ASCII, accented Latin, CJK, or emoji all work.
- Pick output format: hex (most common), decimal, or 8-bit binary.
- Set a hex prefix if you need C-style (
x41) or web-style (0x41) output. - Toggle the UTF-8 BOM if your downstream consumer needs it (most don't - it's optional and discouraged for new content).
- Read the grid - each character shows its byte width (1-4) so you can see why
🌍is 4 bytes andAis 1. - Swap to decode UTF-8 bytes back to text. The decoder is strict - invalid sequences error out instead of producing replacement characters.
Frequently Asked Questions
How does UTF-8 work?
Variable-width encoding: codepoints U+0000-U+007F use 1 byte (ASCII-compatible), U+0080-U+07FF use 2 bytes, U+0800-U+FFFF use 3 bytes, U+10000-U+10FFFF use 4 bytes. The first byte’s high bits signal the width (0xxxxxxx / 110xxxxx / 1110xxxx / 11110xxx), and continuation bytes start with 10xxxxxx.
Why is 🌍 four bytes but A is one?
UTF-8 is variable-length by codepoint. A (U+0041) fits in 7 bits → 1 byte. 🌍 (U+1F30D) needs 17 bits → 4 bytes. The grid shows each character’s exact width.
What’s the UTF-8 BOM?
Three magic bytes EF BB BF (U+FEFF encoded in UTF-8) that mark a file as UTF-8. Microsoft tools often add it; Unix tools mostly don’t. The Unicode standard discourages it for UTF-8 because UTF-8 has no byte-order ambiguity to resolve. If your consumer treats it as content (rare bug), strip it.
Why does your decoder reject some byte sequences?
The tool uses TextDecoder in fatal mode – invalid UTF-8 throws an error rather than substituting U+FFFD replacement characters. Common rejections: truncated sequences (4-byte char missing its tail), overlong encodings, lone continuation bytes, and bytes encoding surrogates U+D800-U+DFFF (illegal in UTF-8).
Hex, decimal, or binary – which should I use?
Hex (e.g., F0 9F 8C 8D) is the de-facto standard in docs, debuggers, and network captures – compact and aligned to byte boundaries. Decimal (e.g., 240 159 140 141) matches byte[] literals in some languages. Binary (e.g., 11110000…) makes the bit-level UTF-8 encoding pattern visible.
What about the 0x and x prefixes?
0x matches most language hex literals (JS, Python, C, Go). x matches C/C++/Python/Rust string-byte escapes. Pick “none” for plain space-separated bytes you’d paste into a network analyzer.
Is my text uploaded?
No. Everything runs in your browser via TextEncoder / TextDecoder.
Does it work offline?
Yes. The whole tool weighs about 18 KB, so once the page has loaded it runs without any network connection – every conversion happens locally in JavaScript on your device.
What’s the input cap?
200,000 characters per encode. Decode is similarly capped. The tool runs on the main thread (no Web Worker), so the cap protects against accidental tab-freezes on huge inputs.
UTF-8 vs UTF-16 vs UTF-32 – when to use which?
UTF-8 dominates the web, files, and APIs because it’s ASCII-backward-compatible and space-efficient for Latin text. UTF-16 is JavaScript’s internal string format and is used by Windows/Java APIs. UTF-32 is mostly used internally where fixed-width indexing matters. For storage or transmission, UTF-8 is almost always the right choice today.
Related Tools
Center Unicode Text →
Center Unicode text within a fixed width, with real grapheme counting for emoji and…
Check Spoofed Unicode Text →
Detect Unicode confusables and homoglyphs from Cyrillic, Greek, Armenian, and Hebrew that imitate Latin…
Chunkify Unicode Text →
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji…
ASCII to Unicode Converter →
ASCII to Unicode & Decode decimal, hex, octal, or U+XXXX values to Unicode characters…
Convert Code Points to Unicode →
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters - handles emoji, CJK,…
Convert Unicode to ASCII →
Convert Unicode to ASCII with transliteration (é → e, ñ → n), replace, or…
Convert Unicode to Base64 →
Encode Unicode text to Base64 (and decode) with standard, URL-safe, MIME variants. UTF-8 proper.…
Convert Unicode to Binary →
Convert Unicode to binary in 3 modes (UTF-8, codepoint, UTF-16). Per-character breakdown. Free, offline,…
Convert Unicode to Bytes →
Convert Unicode to UTF-8 bytes in hex, decimal, or binary. Per-byte grid, reverse direction.…
Convert Unicode to Code Points →
Convert Unicode to code points (U+XXXX, HTML/CSS/JS escapes) and back. Per-character breakdown. Free, offline,…
Convert Unicode to Data URL →
Convert Unicode to data URLs with base64 or URL-encoding, 12 MIME types, charset toggle.…
Convert Unicode to Decimal →
Convert Unicode text to decimal code point values.