Convert UTF-16 to UTF-8
Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate validation, bidirectional. Free, client-side, instant, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert UTF-16 code units (decimal byte pairs, hex, or binary) into decoded text plus the resulting UTF-8 byte sequence. Big-endian or little-endian. Surrogate pairs validated explicitly - lone surrogates raise errors rather than silently becoming replacement characters.
Per-character breakdown
How to Use Convert UTF-16 to UTF-8
- Paste your UTF-16 code units. Hex (e.g.,
0048 0069) is the most common; decimal byte pairs and 16-bit binary are also supported. - Pick the byte order: Big-endian (most network protocols) or Little-endian (Windows, x86 memory dumps).
- If your input starts with a BOM (
FEFForFFFE), it's stripped automatically and noted in stats. - The output panel shows decoded text. Below it, the UTF-8 bytes (hex) panel shows the same text serialized as UTF-8 so you can verify the conversion.
- The grid breaks down each character: codepoint, whether it consumed 1 or 2 UTF-16 units (surrogate pair), and the UTF-8 bytes produced.
- Swap to reverse - type text and get back UTF-16 code units in your chosen format and endianness.
Frequently Asked Questions
How does the decimal byte-pair format work?
Two consecutive integers (0-255) form one UTF-16 code unit. The byte order setting controls which byte is high. In Big-endian: 0 72 = (0<<8)|72 = U+0048 = “H”. In Little-endian: 72 0 = (0<<8)|72 = U+0048 = “H”. A common confusion: many sources write 72 0 101 0 as “byte pairs for ‘He'” – that’s actually little-endian. With Big-endian selected, you need 0 72 0 101.
How are surrogate pairs validated?
High surrogates (U+D800-U+DBFF) must be followed by low surrogates (U+DC00-U+DFFF). The decoder combines them via 0x10000 + ((high - 0xD800) << 10) + (low - 0xDC00). Unlike many converters, this tool raises an explicit error on lone surrogates rather than silently emitting U+FFFD – so silently broken inputs are visible, not papered over.
What’s a BOM?
Byte Order Mark – the codepoint U+FEFF. As UTF-16, it serializes as FE FF (BE) or FF FE (LE), letting readers detect the byte order. The decoder auto-strips a leading FEFF/FFFE if present (stats will say “BOM stripped”). On reverse, toggle the BOM checkbox to prepend it.
Why is BE → LE not just byte-swapping?
For hex, it IS byte-swapping per 16-bit unit: BE 0048 → LE 4800. For decimal byte-pairs, the swap happens at the pair level (order of the two bytes flips). For binary, the high and low 8-bit halves swap. The decoder applies the right transform per format automatically.
What if my source uses 4-byte UTF-16 (non-BMP) sequences?
Non-BMP characters (emoji, rare scripts) use surrogate pairs – TWO 16-bit code units in UTF-16. Example: 🌍 (U+1F30D) is D83C DF0D. The grid shows these as “pair (2)” so you can tell where surrogate-pair-driven non-BMP characters are.
How does UTF-8 byte count compare?
UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Hebrew/Arabic (U+0080-U+07FF), 3 for most CJK and the rest of BMP (U+0800-U+FFFF), 4 for non-BMP (U+10000-U+10FFFF). The stats line shows the precise count.
Why not just use TextDecoder?
The browser’s TextDecoder('utf-16be') works for raw byte buffers but silently substitutes U+FFFD for invalid surrogates. We do it manually so we can report which exact code unit position caused the failure – useful when debugging real-world UTF-16 streams.
Is my data sent anywhere?
No. Parsing, decoding, and UTF-8 encoding happen entirely in your browser. No network requests.
What’s the input cap?
200,000 characters. The lower cap keeps the UI responsive.
UTF-16 vs UTF-8 – when to use which?
UTF-8 dominates web, files, APIs, and modern protocols. UTF-16 is the internal string representation in JavaScript engines, Windows APIs (UCS-2 originally), and Java char. If you’re storing or transmitting text, almost always UTF-8. If you’re poking at JS string code units (.charCodeAt) or Windows wide-char APIs, you’re touching UTF-16.
Related Tools
Convert UTF-8 to UTF-16 →
Convert UTF-8 text to UTF-16 code units (hex/decimal/binary, BE/LE, BOM). Bidirectional, surrogate validation. Free,…
Binary to UTF-8 Decoder →
Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…
Convert Arbitrary Base to UTF-8 →
Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…
Base64 to UTF-8 Decoder →
Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…
Convert Bytes to UTF-8 →
Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…
Code Points to UTF-8 Converter Free →
Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…
Convert Data URI to UTF-8 →
online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…
Convert Decimal to UTF-8 →
online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…
Convert Hexadecimal to UTF-8 →
Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…
Convert HTML Entities to UTF-8 →
Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…
Convert Octal to UTF-8 →
Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…
Convert UTF-32 to UTF-8 →
Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…