Convert UTF-8 to Binary

In short

Convert UTF-8 text to 8-bit binary per byte with grouping, separator, prefix-bit highlighting. Bidirectional. Free, client-side, instant, secure.

Runs in your browser
Nothing uploaded
Free, no sign-up

Convert UTF-8 text to 8-bit binary per byte. Toggle prefix-bit highlighting to see UTF-8's variable-width encoding pattern (0xxxxxxx for ASCII, 110xxxxx / 1110xxxx / 11110xxx for multi-byte leaders, 10xxxxxx for continuations). Multi-byte characters and emoji round-trip correctly.

UTF-8 text input

Separator

Group by

Highlight UTF-8 prefix bits in grid

Binary output

Per-character breakdown

Type to begin.

🛡

100% PrivateNo server uploads, ever

⚡

InstantRuns in your browser

💧

No WatermarksClean output, always

🆓

Free ForeverNo accounts, no limits

How to Use Convert UTF-8 to Binary

Convert UTF-8 to Binary · free online tool · All Tools Verse — Convert UTF-8 to Binary. Free online tool that runs in your browser.

Paste UTF-8 text. The tool extracts the byte sequence via TextEncoder - emoji and multi-byte characters work correctly.
Each byte becomes 8 bits, padded with leading zeros. A (0x41) → 01000001.
Pick separator (space default) and grouping. Byte mode gives one token per byte. Char mode concatenates the bytes belonging to each codepoint (so a 4-byte emoji becomes one 32-bit token) - useful for visualizing per-character cost.
Toggle prefix-bit highlighting in the grid to see the UTF-8 encoding pattern. ASCII bytes start with 0; multi-byte leaders start with 110, 1110, or 11110; continuation bytes start with 10.
Swap direction to decode binary tokens back to UTF-8 text. Whitespace-tolerant; throws on invalid bit groupings or invalid UTF-8.

Frequently Asked Questions

How does UTF-8 encode multi-byte characters?

By bit-pattern in the leading byte: 0xxxxxxx = 1-byte ASCII (7 bits of data); 110xxxxx = 2-byte sequence (5 + 6 = 11 bits); 1110xxxx = 3-byte sequence (4 + 6 + 6 = 16 bits); 11110xxx = 4-byte sequence (3 + 6 + 6 + 6 = 21 bits). Continuation bytes always start with 10xxxxxx. The grid colour-codes these so you can see them.

Why is 🌍 four bytes (32 bits) but A is just one?

UTF-8 is variable-width. A (codepoint U+0041) fits in 7 bits, so it uses the 1-byte form. 🌍 (U+1F30D) needs 17 bits of data, which only fits in the 4-byte form. Look at the grid: emoji bytes start with 11110... for the leader and 10... for the three continuations.

What does “Group by char” do?

Default Byte mode produces one 8-bit token per UTF-8 byte. Char mode concatenates the bytes belonging to each codepoint into a single longer token, so you can see at a glance how many bits a character costs: A → 8 bits; é → 16 bits; 🌍 → 32 bits as one long binary string.

How does decoding validate the input?

Three checks: every character must be 0 or 1 (anything else throws); total bit count must be a multiple of 8 (incomplete byte at end throws); the assembled bytes must form valid UTF-8 (fatal-mode TextDecoder throws on lone continuations, truncated leaders, overlong sequences, encoded surrogates).

Why fatal mode instead of replacement chars?

Silent U+FFFD substitution hides bugs. If you assembled the binary by hand and got the grouping wrong, you want an explicit error pointing at “not valid UTF-8” rather than a slightly-different output that looks plausibly correct.

What if my binary uses a different separator?

The decoder splits on any whitespace, commas, or semicolons – paste binary in any common format and it should work. Just make sure each token is exactly 8 bits (with leading zeros for small values like 00100000 for space).

Can I omit leading zeros?

Not for decoding. The decoder requires 8-bit-aligned tokens. For encoding, the output is always 8-bit padded for the same reason – so it round-trips through the decoder unambiguously.

Is the input cap enforced?

Yes, 200,000 characters of input text. Binary output grows 8× the byte count, so even modest text produces a lot of bits – the cap protects against tab-freezes.

What’s the difference vs the Unicode → Binary tool?

The Unicode→Binary tool offers 3 modes including UTF-8, codepoint-as-fixed-bits, and UTF-16. This tool is UTF-8-specific and adds prefix-bit highlighting for visualizing the variable-width encoding pattern.

Is anything uploaded?

No. Everything runs in your browser via TextEncoder/TextDecoder.

Keep going

Related Tools

All Utf8 tools →

Embed this tool

Add this free tool to your website. Copy and paste the code:

<iframe src="https://alltoolsverse.com/tools/convert-utf8-to-binary/?embed=1" width="100%" height="760" loading="lazy" style="max-width:900px;border:1px solid #e2e8f0;border-radius:12px" title="Convert UTF-8 to Binary"></iframe>
<p>Free tool: <a href="https://alltoolsverse.com/tools/convert-utf8-to-binary/">Convert UTF-8 to Binary</a> by All Tools Verse</p>

Per-character breakdown

Related Tools

Binary to UTF-8 Decoder →

Convert Arbitrary Base to UTF-8 →

Base64 to UTF-8 Decoder →

Convert Bytes to UTF-8 →

Code Points to UTF-8 Converter Free →

Convert Data URI to UTF-8 →

Convert Decimal to UTF-8 →

Convert Hexadecimal to UTF-8 →

Convert HTML Entities to UTF-8 →

Convert Octal to UTF-8 →

Convert UTF-16 to UTF-8 →

Convert UTF-32 to UTF-8 →

Embed this tool