Convert UTF-16 to UTF-8

In short

Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate validation, bidirectional. Free, client-side, instant, secure.

Runs in your browser
Nothing uploaded
Free, no sign-up

Convert UTF-16 code units (decimal byte pairs, hex, or binary) into decoded text plus the resulting UTF-8 byte sequence. Big-endian or little-endian. Surrogate pairs validated explicitly - lone surrogates raise errors rather than silently becoming replacement characters.

UTF-16 code units input

Format

Byte order

Include BOM (reverse only)

Decoded text

UTF-8 bytes (hex)

Per-character breakdown

Type to begin.

🛡

100% PrivateNo server uploads, ever

⚡

InstantRuns in your browser

💧

No WatermarksClean output, always

🆓

Free ForeverNo accounts, no limits

How to Use Convert UTF-16 to UTF-8

Paste your UTF-16 code units. Hex (e.g., 0048 0069) is the most common; decimal byte pairs and 16-bit binary are also supported.
Pick the byte order: Big-endian (most network protocols) or Little-endian (Windows, x86 memory dumps).
If your input starts with a BOM (FEFF or FFFE), it's stripped automatically and noted in stats.
The output panel shows decoded text. Below it, the UTF-8 bytes (hex) panel shows the same text serialized as UTF-8 so you can verify the conversion.
The grid breaks down each character: codepoint, whether it consumed 1 or 2 UTF-16 units (surrogate pair), and the UTF-8 bytes produced.
Swap to reverse - type text and get back UTF-16 code units in your chosen format and endianness.

Frequently Asked Questions

How does the decimal byte-pair format work?

Two consecutive integers (0-255) form one UTF-16 code unit. The byte order setting controls which byte is high. In Big-endian: 0 72 = (0<<8)|72 = U+0048 = “H”. In Little-endian: 72 0 = (0<<8)|72 = U+0048 = “H”. A common confusion: many sources write 72 0 101 0 as “byte pairs for ‘He'” – that’s actually little-endian. With Big-endian selected, you need 0 72 0 101.

How are surrogate pairs validated?

High surrogates (U+D800-U+DBFF) must be followed by low surrogates (U+DC00-U+DFFF). The decoder combines them via 0x10000 + ((high - 0xD800) << 10) + (low - 0xDC00). Unlike many converters, this tool raises an explicit error on lone surrogates rather than silently emitting U+FFFD – so silently broken inputs are visible, not papered over.

What’s a BOM?

Byte Order Mark – the codepoint U+FEFF. As UTF-16, it serializes as FE FF (BE) or FF FE (LE), letting readers detect the byte order. The decoder auto-strips a leading FEFF/FFFE if present (stats will say “BOM stripped”). On reverse, toggle the BOM checkbox to prepend it.

Why is BE → LE not just byte-swapping?

For hex, it IS byte-swapping per 16-bit unit: BE 0048 → LE 4800. For decimal byte-pairs, the swap happens at the pair level (order of the two bytes flips). For binary, the high and low 8-bit halves swap. The decoder applies the right transform per format automatically.

What if my source uses 4-byte UTF-16 (non-BMP) sequences?

Non-BMP characters (emoji, rare scripts) use surrogate pairs – TWO 16-bit code units in UTF-16. Example: 🌍 (U+1F30D) is D83C DF0D. The grid shows these as “pair (2)” so you can tell where surrogate-pair-driven non-BMP characters are.

How does UTF-8 byte count compare?

UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Hebrew/Arabic (U+0080-U+07FF), 3 for most CJK and the rest of BMP (U+0800-U+FFFF), 4 for non-BMP (U+10000-U+10FFFF). The stats line shows the precise count.

Why not just use TextDecoder?

The browser’s TextDecoder('utf-16be') works for raw byte buffers but silently substitutes U+FFFD for invalid surrogates. We do it manually so we can report which exact code unit position caused the failure – useful when debugging real-world UTF-16 streams.

Is my data sent anywhere?

No. Parsing, decoding, and UTF-8 encoding happen entirely in your browser. No network requests.

What’s the input cap?

200,000 characters. The lower cap keeps the UI responsive.

UTF-16 vs UTF-8 – when to use which?

UTF-8 dominates web, files, APIs, and modern protocols. UTF-16 is the internal string representation in JavaScript engines, Windows APIs (UCS-2 originally), and Java char. If you’re storing or transmitting text, almost always UTF-8. If you’re poking at JS string code units (.charCodeAt) or Windows wide-char APIs, you’re touching UTF-16.

Keep going

Related Tools

All Utf8 tools →

Embed this tool

Add this free tool to your website. Copy and paste the code:

<iframe src="https://alltoolsverse.com/tools/convert-utf16-to-utf8/?embed=1" width="100%" height="760" loading="lazy" style="max-width:900px;border:1px solid #e2e8f0;border-radius:12px" title="Convert UTF-16 to UTF-8"></iframe>
<p>Free tool: <a href="https://alltoolsverse.com/tools/convert-utf16-to-utf8/">Convert UTF-16 to UTF-8</a> by All Tools Verse</p>

Per-character breakdown

Related Tools

Convert UTF-8 to UTF-16 →

Binary to UTF-8 Decoder →

Convert Arbitrary Base to UTF-8 →

Base64 to UTF-8 Decoder →

Convert Bytes to UTF-8 →

Code Points to UTF-8 Converter Free →

Convert Data URI to UTF-8 →

Convert Decimal to UTF-8 →

Convert Hexadecimal to UTF-8 →

Convert HTML Entities to UTF-8 →

Convert Octal to UTF-8 →

Convert UTF-32 to UTF-8 →

Embed this tool