Home Tools Blog About

Convert UTF-16 to UTF-8

In short

Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate validation, bidirectional. Free, client-side, instant, secure.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Convert UTF-16 code units (decimal byte pairs, hex, or binary) into decoded text plus the resulting UTF-8 byte sequence. Big-endian or little-endian. Surrogate pairs validated explicitly - lone surrogates raise errors rather than silently becoming replacement characters.

Per-character breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert UTF-16 to UTF-8

  1. Paste your UTF-16 code units. Hex (e.g., 0048 0069) is the most common; decimal byte pairs and 16-bit binary are also supported.
  2. Pick the byte order: Big-endian (most network protocols) or Little-endian (Windows, x86 memory dumps).
  3. If your input starts with a BOM (FEFF or FFFE), it's stripped automatically and noted in stats.
  4. The output panel shows decoded text. Below it, the UTF-8 bytes (hex) panel shows the same text serialized as UTF-8 so you can verify the conversion.
  5. The grid breaks down each character: codepoint, whether it consumed 1 or 2 UTF-16 units (surrogate pair), and the UTF-8 bytes produced.
  6. Swap to reverse - type text and get back UTF-16 code units in your chosen format and endianness.

Frequently Asked Questions

How does the decimal byte-pair format work?

Two consecutive integers (0-255) form one UTF-16 code unit. The byte order setting controls which byte is high. In Big-endian: 0 72 = (0<<8)|72 = U+0048 = “H”. In Little-endian: 72 0 = (0<<8)|72 = U+0048 = “H”. A common confusion: many sources write 72 0 101 0 as “byte pairs for ‘He'” – that’s actually little-endian. With Big-endian selected, you need 0 72 0 101.

How are surrogate pairs validated?

High surrogates (U+D800-U+DBFF) must be followed by low surrogates (U+DC00-U+DFFF). The decoder combines them via 0x10000 + ((high - 0xD800) << 10) + (low - 0xDC00). Unlike many converters, this tool raises an explicit error on lone surrogates rather than silently emitting U+FFFD – so silently broken inputs are visible, not papered over.

What’s a BOM?

Byte Order Mark – the codepoint U+FEFF. As UTF-16, it serializes as FE FF (BE) or FF FE (LE), letting readers detect the byte order. The decoder auto-strips a leading FEFF/FFFE if present (stats will say “BOM stripped”). On reverse, toggle the BOM checkbox to prepend it.

Why is BE → LE not just byte-swapping?

For hex, it IS byte-swapping per 16-bit unit: BE 0048 → LE 4800. For decimal byte-pairs, the swap happens at the pair level (order of the two bytes flips). For binary, the high and low 8-bit halves swap. The decoder applies the right transform per format automatically.

What if my source uses 4-byte UTF-16 (non-BMP) sequences?

Non-BMP characters (emoji, rare scripts) use surrogate pairs – TWO 16-bit code units in UTF-16. Example: 🌍 (U+1F30D) is D83C DF0D. The grid shows these as “pair (2)” so you can tell where surrogate-pair-driven non-BMP characters are.

How does UTF-8 byte count compare?

UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Hebrew/Arabic (U+0080-U+07FF), 3 for most CJK and the rest of BMP (U+0800-U+FFFF), 4 for non-BMP (U+10000-U+10FFFF). The stats line shows the precise count.

Why not just use TextDecoder?

The browser’s TextDecoder('utf-16be') works for raw byte buffers but silently substitutes U+FFFD for invalid surrogates. We do it manually so we can report which exact code unit position caused the failure – useful when debugging real-world UTF-16 streams.

Is my data sent anywhere?

No. Parsing, decoding, and UTF-8 encoding happen entirely in your browser. No network requests.

What’s the input cap?

200,000 characters. The lower cap keeps the UI responsive.

UTF-16 vs UTF-8 – when to use which?

UTF-8 dominates web, files, APIs, and modern protocols. UTF-16 is the internal string representation in JavaScript engines, Windows APIs (UCS-2 originally), and Java char. If you’re storing or transmitting text, almost always UTF-8. If you’re poking at JS string code units (.charCodeAt) or Windows wide-char APIs, you’re touching UTF-16.

Keep going

Related Tools

All Utf8 tools →

Convert UTF-8 to UTF-16

Convert UTF-8 text to UTF-16 code units (hex/decimal/binary, BE/LE, BOM). Bidirectional, surrogate validation. Free,…

Binary to UTF-8 Decoder

Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…

Convert Arbitrary Base to UTF-8

Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…

Base64 to UTF-8 Decoder

Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…

Convert Bytes to UTF-8

Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…

Code Points to UTF-8 Converter Free

Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…

Convert Data URI to UTF-8

online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…

Convert Decimal to UTF-8

online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…

Convert Hexadecimal to UTF-8

Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…

Convert HTML Entities to UTF-8

Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…

Convert Octal to UTF-8

Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…

Convert UTF-32 to UTF-8

Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…

Share

Embed this tool

Add this free tool to your website. Copy and paste the code: