Home Tools Blog About

Convert UTF-8 to UTF-16

In short

Convert UTF-8 text to UTF-16 code units (hex/decimal/binary, BE/LE, BOM). Bidirectional, surrogate validation. Free, client-side, instant, secure.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Convert UTF-8 text to UTF-16 code units in 3 formats (hex / decimal byte pairs / 16-bit binary) with endianness (BE/LE) and optional BOM (U+FEFF). Surrogate pairs are emitted for non-BMP characters and validated explicitly on reverse - lone surrogates raise errors rather than silent corruption.

Per-character breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert UTF-8 to UTF-16

  1. Paste UTF-8 text. JS strings are already stored as UTF-16 internally, so this tool extracts the existing code units.
  2. Pick a format. Hex is the standard 0048 shorthand. Decimal byte pairs emits 2 bytes per unit. Binary shows the 16 bits explicitly.
  3. Choose endianness. BE puts the high byte first (0048); LE swaps to (4800). BOM (U+FEFF) marks the byte order in serialized streams.
  4. For non-BMP characters (U+10000 to U+10FFFF), UTF-16 uses surrogate pairs - two 16-bit code units. 🌍 (U+1F30D) → D83C DF0D. The grid flags pairs.
  5. Swap to decode. Format auto-detected. Lone surrogates throw with position; valid pairs combine to codepoints.

Frequently Asked Questions

What’s the relationship between UTF-8 and UTF-16?

Both encode the same Unicode codepoints, just differently. UTF-8 uses 1-4 bytes per codepoint with variable-width sequences. UTF-16 uses 2 or 4 bytes (one or two 16-bit code units, surrogate pair for non-BMP). They roundtrip losslessly through the codepoint layer.

Why does emoji produce a “pair”?

Non-BMP characters (U+10000+) don’t fit in one 16-bit code unit, so UTF-16 splits them into a high surrogate (U+D800-U+DBFF) + low surrogate (U+DC00-U+DFFF). The encoding: high = 0xD800 + ((cp – 0x10000) >> 10); low = 0xDC00 + ((cp – 0x10000) & 0x3FF). The grid labels these as “pair (2)” so you see them.

How does the reverse validate surrogates?

Every high surrogate must be immediately followed by a low surrogate; every low surrogate must follow a high surrogate. Violations throw explicit errors with position – unlike some converters that silently substitute U+FFFD, this tool surfaces broken streams so you can fix them.

What’s the UTF-16 BOM?

U+FEFF prepended. Serializes as FE FF in BE or FF FE in LE – a reader can detect byte order from these. Distinct from the UTF-32 BOM (4 bytes 00 00 FE FF). The decoder auto-strips on detect.

Why is JavaScript’s internal string UTF-16?

Historical: when ECMAScript 1 was specified (1997), Unicode fit in 16 bits (BMP only). JS adopted “UCS-2” code units; later when Unicode expanded past U+FFFF, surrogate pairs were retrofitted into UTF-16 to maintain backwards compatibility. str.length returns code units, NOT codepoints – so "🌍".length is 2.

How big is the UTF-16 output compared to UTF-8?

For ASCII: UTF-16 is 2× UTF-8 (every ASCII char becomes 2 bytes vs 1). For mixed Latin: roughly even. For CJK: UTF-16 is ~33% smaller (2 bytes per char vs 3 in UTF-8). For emoji: same (4 bytes per char in both, just different splits).

Does this support BMP-only mode?

No – that would be UCS-2, which is a strict subset of UTF-16 that rejects non-BMP. Modern UTF-16 always supports surrogate pairs. If your downstream consumer is strict UCS-2, codepoints beyond U+FFFF will fail there, not in this tool.

Is text uploaded?

No. The conversion runs entirely in your browser – nothing is sent to a server, logged, or stored, and the tool keeps working offline once the page has loaded.

Input cap?

200,000 characters per run, roughly a short novel chapter. The cap keeps the browser responsive; for bigger jobs, split the text and run it in parts.

How does this compare to the UTF-16 to UTF-8 tool?

Inverse direction. That tool parses UTF-16 code units and produces text + UTF-8 bytes. This tool starts from UTF-8 text and produces UTF-16 code units. Both validate surrogates strictly.

Keep going

Related Tools

All Utf8 tools →

Convert UTF-16 to UTF-8

Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…

Binary to UTF-8 Decoder

Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…

Convert Arbitrary Base to UTF-8

Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…

Base64 to UTF-8 Decoder

Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…

Convert Bytes to UTF-8

Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…

Code Points to UTF-8 Converter Free

Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…

Convert Data URI to UTF-8

online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…

Convert Decimal to UTF-8

online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…

Convert Hexadecimal to UTF-8

Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…

Convert HTML Entities to UTF-8

Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…

Convert Octal to UTF-8

Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…

Convert UTF-32 to UTF-8

Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…

Share

Embed this tool

Add this free tool to your website. Copy and paste the code: