Convert Unicode to Binary
Convert Unicode to binary in 3 modes (UTF-8, codepoint, UTF-16). Per-character breakdown. Free, offline, client-side, instant, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert Unicode text to binary in three modes: UTF-8 bytes (the modern web standard), Unicode codepoints (one binary per character), or UTF-16 code units (matches JavaScript's internal string representation). Per-character grid shows all three side by side.
Per-character breakdown
How to Use Convert Unicode to Binary
- Paste your text. ASCII, accented Latin, CJK, emoji, mathematical symbols - anything Unicode. Multiple lines work as batch (one input per line → TSV output).
- Pick an encoding mode. UTF-8 bytes (default) - what the web sends; each byte is 8 bits.
A→01000001, 😀 → 4 bytes11110000 10011111 10011000 10000000. Unicode codepoint - each character's codepoint number in binary.A(U+0041) →1000001, 😀 (U+1F600) → 17 bits11111011000000000. UTF-16 code units - what JavaScript stores; BMP chars are 1 unit, non-BMP chars are surrogate pairs (2 units of 16 bits each). - Set codepoint padding (visible only in codepoint mode). 8 bits for ASCII-only data. 16 bits for BMP (most languages). 21 bits for full Unicode range (current standard caps at U+10FFFF). 32 bits for fixed-width storage. No padding = minimum binary length per character (varies).
- Choose a separator. Space (default) - most readable. Newline - one binary per line for tall display. Comma - CSV-compatible. None - single continuous bit stream useful for piping into other tools.
- Read the per-character grid. Each row shows the character, its codepoint, UTF-8 hex bytes, and the binary output in your selected mode. Switch modes and watch the binary column change while everything else stays constant.
- Swap for reverse. ⇄ flips to Binary → Unicode. The decoder respects the selected mode: UTF-8 mode expects multiples of 8 bits and runs through strict UTF-8 decoding; UTF-16 mode expects 16-bit units; codepoint mode expects whitespace-separated codepoint binaries. Errors are specific (wrong bit count, invalid character, codepoint out of range).
- Stats summarize. Character count, UTF-8 byte count, total bit count, mode, padding (when applicable). Useful for comparing encoding overhead - UTF-8 typically wins for ASCII-heavy text, UTF-16 ties for European text, UTF-8 loses for pure CJK or emoji where 4-byte sequences dominate.
Frequently Asked Questions
Which mode should I use?
Depends on what you’re doing. UTF-8 matches what the web uses, what most file formats store, and what most networking protocols transmit – pick this for general data interchange. Codepoint matches conceptual Unicode “characters” and is useful for teaching how Unicode works. UTF-16 matches JavaScript’s internal string representation, useful when debugging surrogate pair issues or when working with .NET / Java which also use UTF-16 internally.
Why is the UTF-8 binary for 😀 four bytes long?
Because 😀 (U+1F600) is outside the Basic Multilingual Plane (codepoint > U+FFFF). UTF-8 encodes it as 4 bytes following the standard pattern: lead byte 11110xxx + three continuation bytes 10xxxxxx. Total: F0 9F 98 80 hex or 11110000 10011111 10011000 10000000 binary – 32 bits to represent one character. That’s why emoji-heavy text feels “expensive” in bytes.
What’s the codepoint padding for?
Codepoints have variable bit-width – U+0041 needs 7 bits, U+10FFFF needs 21 bits. Padding to a fixed width gives uniform output: 8 bits for ASCII-only data, 16 bits for everything in BMP (Basic Multilingual Plane covering most living languages), 21 bits for the full Unicode range, 32 bits for fixed-width storage. Without padding, the binary lengths vary character-to-character, which can be hard to parse.
What happens with surrogate pairs in UTF-16 mode?
Non-BMP characters (codepoint > U+FFFF, like emoji) need a surrogate pair in UTF-16 – two 16-bit code units. So 😀 becomes 11011000 00111101 11011110 00000000 (the high surrogate 0xD83D followed by low surrogate 0xDE00). Decoding reverses this: 16-bit units in the surrogate range get paired up to recover the original codepoint.
Can I decode binary back to text?
Yes – click Swap. The decoder respects the selected mode and requires the appropriate bit alignment: UTF-8 needs multiples of 8, UTF-16 needs multiples of 16, codepoint mode needs whitespace-separated groups. Specific errors point at the problem: "UTF-8 mode requires a multiple of 8 bits (got 13)", "Codepoint 1234567 exceeds U+10FFFF maximum", etc.
What’s the difference vs the “Convert String to Binary” tool?
That sibling tool only emits UTF-8 binary. This one supports three encoding modes (UTF-8, codepoint, UTF-16) with a per-character grid showing all three side by side. Use the sibling for quick UTF-8 conversions, use this for educational comparisons or when you specifically need codepoint or UTF-16 representations.
Does it handle invisible characters?
Yes. The per-character grid substitutes visible glyphs for whitespace (space → ·, newline → ↵, tab → →) so rows aren’t blank. The actual encoded bits still represent the original characters (00100000, 00001010, 00001001 respectively). Useful when debugging mysterious whitespace differences.
What’s the input size limit?
200,000 characters. The per-character grid caps at 256 displayed rows with a “… N more” note, but the conversion processes the full input – only the grid display is truncated. Copy and Download give you the complete binary output.
Is my text uploaded anywhere?
No. All encoding (UTF-8 via TextEncoder, UTF-16 via charCodeAt, codepoint via codePointAt), binary formatting, and TSV emission run in your browser. Open DevTools → Network and confirm zero requests fire – even when you Convert or Download. Safe for sensitive text, code samples, or proprietary content.
Does it work offline?
Yes. Total bundle is about 18 KB. Load once, disconnect, keep using. Pure JavaScript with no remote dependencies. Useful for teaching Unicode encoding in classrooms without internet, or debugging encoding issues on airgapped dev machines.
Related Tools
Center Unicode Text →
Center Unicode text within a fixed width, with real grapheme counting for emoji and…
Check Spoofed Unicode Text →
Detect Unicode confusables and homoglyphs from Cyrillic, Greek, Armenian, and Hebrew that imitate Latin…
Chunkify Unicode Text →
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji…
ASCII to Unicode Converter →
ASCII to Unicode & Decode decimal, hex, octal, or U+XXXX values to Unicode characters…
Convert Code Points to Unicode →
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters - handles emoji, CJK,…
Convert Unicode to ASCII →
Convert Unicode to ASCII with transliteration (é → e, ñ → n), replace, or…
Convert Unicode to Base64 →
Encode Unicode text to Base64 (and decode) with standard, URL-safe, MIME variants. UTF-8 proper.…
Convert Unicode to Bytes →
Convert Unicode to UTF-8 bytes in hex, decimal, or binary. Per-byte grid, reverse direction.…
Convert Unicode to Code Points →
Convert Unicode to code points (U+XXXX, HTML/CSS/JS escapes) and back. Per-character breakdown. Free, offline,…
Convert Unicode to Data URL →
Convert Unicode to data URLs with base64 or URL-encoding, 12 MIME types, charset toggle.…
Convert Unicode to Decimal →
Convert Unicode text to decimal code point values.
Convert Unicode to Hex →
Convert Unicode to hex codepoints with prefix/padding/case options (and back). Per-character breakdown. Free, offline,…