Convert UTF-8 to Binary
Convert UTF-8 text to 8-bit binary per byte with grouping, separator, prefix-bit highlighting. Bidirectional. Free, client-side, instant, secure.
Convert UTF-8 text to 8-bit binary per byte. Toggle prefix-bit highlighting to see UTF-8's variable-width encoding pattern (0xxxxxxx for ASCII, 110xxxxx / 1110xxxx / 11110xxx for multi-byte leaders, 10xxxxxx for continuations). Multi-byte characters and emoji round-trip correctly.
Per-character breakdown
How to Use Convert UTF-8 to Binary
- Paste UTF-8 text. The tool extracts the byte sequence via
TextEncoder- emoji and multi-byte characters work correctly. - Each byte becomes 8 bits, padded with leading zeros.
A(0x41) โ01000001. - Pick separator (space default) and grouping. Byte mode gives one token per byte. Char mode concatenates the bytes belonging to each codepoint (so a 4-byte emoji becomes one 32-bit token) - useful for visualizing per-character cost.
- Toggle prefix-bit highlighting in the grid to see the UTF-8 encoding pattern. ASCII bytes start with
0; multi-byte leaders start with110,1110, or11110; continuation bytes start with10. - Swap direction to decode binary tokens back to UTF-8 text. Whitespace-tolerant; throws on invalid bit groupings or invalid UTF-8.
Frequently Asked Questions
How does UTF-8 encode multi-byte characters?
By bit-pattern in the leading byte: 0xxxxxxx = 1-byte ASCII (7 bits of data); 110xxxxx = 2-byte sequence (5 + 6 = 11 bits); 1110xxxx = 3-byte sequence (4 + 6 + 6 = 16 bits); 11110xxx = 4-byte sequence (3 + 6 + 6 + 6 = 21 bits). Continuation bytes always start with 10xxxxxx. The grid colour-codes these so you can see them.
Why is ๐ four bytes (32 bits) but A is just one?
UTF-8 is variable-width. A (codepoint U+0041) fits in 7 bits, so it uses the 1-byte form. ๐ (U+1F30D) needs 17 bits of data, which only fits in the 4-byte form. Look at the grid: emoji bytes start with 11110... for the leader and 10... for the three continuations.
What does “Group by char” do?
Default Byte mode produces one 8-bit token per UTF-8 byte. Char mode concatenates the bytes belonging to each codepoint into a single longer token, so you can see at a glance how many bits a character costs: A โ 8 bits; รฉ โ 16 bits; ๐ โ 32 bits as one long binary string.
How does decoding validate the input?
Three checks: every character must be 0 or 1 (anything else throws); total bit count must be a multiple of 8 (incomplete byte at end throws); the assembled bytes must form valid UTF-8 (fatal-mode TextDecoder throws on lone continuations, truncated leaders, overlong sequences, encoded surrogates).
Why fatal mode instead of replacement chars?
Silent U+FFFD substitution hides bugs. If you assembled the binary by hand and got the grouping wrong, you want an explicit error pointing at “not valid UTF-8” rather than a slightly-different output that looks plausibly correct.
What if my binary uses a different separator?
The decoder splits on any whitespace, commas, or semicolons – paste binary in any common format and it should work. Just make sure each token is exactly 8 bits (with leading zeros for small values like 00100000 for space).
Can I omit leading zeros?
Not for decoding. The decoder requires 8-bit-aligned tokens. For encoding, the output is always 8-bit padded for the same reason – so it round-trips through the decoder unambiguously.
Is the input cap enforced?
Yes, 200,000 characters of input text. Binary output grows 8ร the byte count, so even modest text produces a lot of bits – the cap protects against tab-freezes.
What’s the difference vs the Unicode โ Binary tool?
The UnicodeโBinary tool offers 3 modes including UTF-8, codepoint-as-fixed-bits, and UTF-16. This tool is UTF-8-specific and adds prefix-bit highlighting for visualizing the variable-width encoding pattern.
Is anything uploaded?
No. Everything runs in your browser via TextEncoder/TextDecoder.