Convert Unicode to Binary

In short

Convert Unicode to binary in 3 modes (UTF-8, codepoint, UTF-16). Per-character breakdown. Free, offline, client-side, instant, secure.

Runs in your browser
Nothing uploaded
Free, no sign-up

Convert Unicode text to binary in three modes: UTF-8 bytes (the modern web standard), Unicode codepoints (one binary per character), or UTF-16 code units (matches JavaScript's internal string representation). Per-character grid shows all three side by side.

Unicode input

Encoding mode

Codepoint padding

Separator

Binary output

Per-character breakdown

Type to begin.

🛡

100% PrivateNo server uploads, ever

⚡

InstantRuns in your browser

💧

No WatermarksClean output, always

🆓

Free ForeverNo accounts, no limits

How to Use Convert Unicode to Binary

Paste your text. ASCII, accented Latin, CJK, emoji, mathematical symbols - anything Unicode. Multiple lines work as batch (one input per line → TSV output).
Pick an encoding mode. UTF-8 bytes (default) - what the web sends; each byte is 8 bits. A → 01000001, 😀 → 4 bytes 11110000 10011111 10011000 10000000. Unicode codepoint - each character's codepoint number in binary. A (U+0041) → 1000001, 😀 (U+1F600) → 17 bits 11111011000000000. UTF-16 code units - what JavaScript stores; BMP chars are 1 unit, non-BMP chars are surrogate pairs (2 units of 16 bits each).
Set codepoint padding (visible only in codepoint mode). 8 bits for ASCII-only data. 16 bits for BMP (most languages). 21 bits for full Unicode range (current standard caps at U+10FFFF). 32 bits for fixed-width storage. No padding = minimum binary length per character (varies).
Choose a separator. Space (default) - most readable. Newline - one binary per line for tall display. Comma - CSV-compatible. None - single continuous bit stream useful for piping into other tools.
Read the per-character grid. Each row shows the character, its codepoint, UTF-8 hex bytes, and the binary output in your selected mode. Switch modes and watch the binary column change while everything else stays constant.
Swap for reverse. ⇄ flips to Binary → Unicode. The decoder respects the selected mode: UTF-8 mode expects multiples of 8 bits and runs through strict UTF-8 decoding; UTF-16 mode expects 16-bit units; codepoint mode expects whitespace-separated codepoint binaries. Errors are specific (wrong bit count, invalid character, codepoint out of range).
Stats summarize. Character count, UTF-8 byte count, total bit count, mode, padding (when applicable). Useful for comparing encoding overhead - UTF-8 typically wins for ASCII-heavy text, UTF-16 ties for European text, UTF-8 loses for pure CJK or emoji where 4-byte sequences dominate.

Frequently Asked Questions

Which mode should I use?

Depends on what you’re doing. UTF-8 matches what the web uses, what most file formats store, and what most networking protocols transmit – pick this for general data interchange. Codepoint matches conceptual Unicode “characters” and is useful for teaching how Unicode works. UTF-16 matches JavaScript’s internal string representation, useful when debugging surrogate pair issues or when working with .NET / Java which also use UTF-16 internally.

Why is the UTF-8 binary for 😀 four bytes long?

Because 😀 (U+1F600) is outside the Basic Multilingual Plane (codepoint > U+FFFF). UTF-8 encodes it as 4 bytes following the standard pattern: lead byte 11110xxx + three continuation bytes 10xxxxxx. Total: F0 9F 98 80 hex or 11110000 10011111 10011000 10000000 binary – 32 bits to represent one character. That’s why emoji-heavy text feels “expensive” in bytes.

What’s the codepoint padding for?

Codepoints have variable bit-width – U+0041 needs 7 bits, U+10FFFF needs 21 bits. Padding to a fixed width gives uniform output: 8 bits for ASCII-only data, 16 bits for everything in BMP (Basic Multilingual Plane covering most living languages), 21 bits for the full Unicode range, 32 bits for fixed-width storage. Without padding, the binary lengths vary character-to-character, which can be hard to parse.

What happens with surrogate pairs in UTF-16 mode?

Non-BMP characters (codepoint > U+FFFF, like emoji) need a surrogate pair in UTF-16 – two 16-bit code units. So 😀 becomes 11011000 00111101 11011110 00000000 (the high surrogate 0xD83D followed by low surrogate 0xDE00). Decoding reverses this: 16-bit units in the surrogate range get paired up to recover the original codepoint.

Can I decode binary back to text?

Yes – click Swap. The decoder respects the selected mode and requires the appropriate bit alignment: UTF-8 needs multiples of 8, UTF-16 needs multiples of 16, codepoint mode needs whitespace-separated groups. Specific errors point at the problem: "UTF-8 mode requires a multiple of 8 bits (got 13)", "Codepoint 1234567 exceeds U+10FFFF maximum", etc.

What’s the difference vs the “Convert String to Binary” tool?

That sibling tool only emits UTF-8 binary. This one supports three encoding modes (UTF-8, codepoint, UTF-16) with a per-character grid showing all three side by side. Use the sibling for quick UTF-8 conversions, use this for educational comparisons or when you specifically need codepoint or UTF-16 representations.

Does it handle invisible characters?

Yes. The per-character grid substitutes visible glyphs for whitespace (space → ·, newline → ↵, tab → →) so rows aren’t blank. The actual encoded bits still represent the original characters (00100000, 00001010, 00001001 respectively). Useful when debugging mysterious whitespace differences.

What’s the input size limit?

200,000 characters. The per-character grid caps at 256 displayed rows with a “… N more” note, but the conversion processes the full input – only the grid display is truncated. Copy and Download give you the complete binary output.

Is my text uploaded anywhere?

No. All encoding (UTF-8 via TextEncoder, UTF-16 via charCodeAt, codepoint via codePointAt), binary formatting, and TSV emission run in your browser. Open DevTools → Network and confirm zero requests fire – even when you Convert or Download. Safe for sensitive text, code samples, or proprietary content.

Does it work offline?

Yes. Total bundle is about 18 KB. Load once, disconnect, keep using. Pure JavaScript with no remote dependencies. Useful for teaching Unicode encoding in classrooms without internet, or debugging encoding issues on airgapped dev machines.

Keep going

Related Tools

All Unicode tools →

Embed this tool

Add this free tool to your website. Copy and paste the code:

<iframe src="https://alltoolsverse.com/tools/convert-unicode-to-binary/?embed=1" width="100%" height="760" loading="lazy" style="max-width:900px;border:1px solid #e2e8f0;border-radius:12px" title="Convert Unicode to Binary"></iframe>
<p>Free tool: <a href="https://alltoolsverse.com/tools/convert-unicode-to-binary/">Convert Unicode to Binary</a> by All Tools Verse</p>

Per-character breakdown

Related Tools

Center Unicode Text →

Check Spoofed Unicode Text →

Chunkify Unicode Text →

ASCII to Unicode Converter →

Convert Code Points to Unicode →

Convert Unicode to ASCII →

Convert Unicode to Base64 →

Convert Unicode to Bytes →

Convert Unicode to Code Points →

Convert Unicode to Data URL →

Convert Unicode to Decimal →

Convert Unicode to Hex →

Embed this tool