Convert UTF-8 to Bytes
Convert UTF-8 text to bytes in decimal, hex, or binary with prefix and separator options. Bidirectional, emoji-safe. Free, client-side, instant, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert UTF-8 text to bytes in 3 formats (decimal / hex / binary) with optional prefix (0x / \x / %), 4 separators, and per-character vs per-byte grouping. Bidirectional, strict UTF-8 validation on reverse.
Per-character breakdown
How to Use Convert UTF-8 to Bytes
- Paste UTF-8 text. The tool runs
TextEncoderto extract the raw byte sequence. - Pick a format: hex (most common, compact), decimal (matches byte arrays in some languages), binary (8-bit pattern).
- For hex output, pick a prefix style: none (just
F0),0xfor language literals,xfor C/Python escapes,%for URL-encoding. - Choose a separator and grouping. Byte groups each byte separately; Char concatenates the bytes belonging to each codepoint into one token (useful for seeing per-character cost).
- Swap direction to decode tokens back to text - format auto-detected (binary if 8 chars of 0/1, hex if hex letters or 2 chars, else decimal).
Frequently Asked Questions
How does UTF-8 encode multi-byte characters?
Codepoints U+0000-U+007F use 1 byte (ASCII), U+0080-U+07FF use 2 bytes, U+0800-U+FFFF use 3 bytes, U+10000-U+10FFFF use 4 bytes. The first byte’s high bits signal the width: 0xxxxxxx / 110xxxxx / 1110xxxx / 11110xxx; continuation bytes start with 10xxxxxx.
Why is 🌍 four bytes but A is one?
UTF-8 is variable-width. A (U+0041) fits in 7 bits → 1 byte. 🌍 (U+1F30D) needs 17 bits → 4 bytes. Stats show the per-width distribution.
Which prefix should I use?
0x matches most language hex literals (JS, Python, C, Go). x matches C/C++/Python/Rust string-byte escapes. % matches URL percent-encoding. None = plain space-separated bytes for network analyzers.
What does “Group by char” do?
Default Byte mode produces one token per UTF-8 byte. Char mode concatenates the bytes belonging to each codepoint into a single token, so the output naturally shows per-character cost: A → 41; é → C3A9; 🌍 → F09F8C8D.
How does decoding work?
The decoder auto-detects format by the first token (binary if 8 chars of 0/1, hex if hex letters or 2 chars, else decimal), strips any prefix, parses each token as a byte value 0-255, and runs TextDecoder('utf-8', {fatal: true}) – invalid UTF-8 throws rather than silently substituting U+FFFD.
Does ASCII get 1 byte per character?
Yes – UTF-8 is identical to ASCII for codepoints U+0000-U+007F. This is by design (Ken Thompson’s UTF-8 spec) and is why UTF-8 became the web’s standard encoding.
Can I use this for file sizes?
Yes – the byte count is exactly the storage size for a plain UTF-8 text file. Actual files may add a BOM (3 bytes) or have headers; this tool just measures the text content.
Is text uploaded?
No. TextEncoder / TextDecoder run in the browser.
What’s the input cap?
200,000 characters. Lower cap keeps the UI responsive.
How does this compare to the UTF-8 to Binary tool?
The Binary tool is dedicated to 8-bit binary output with UTF-8 prefix-bit highlighting in its per-character grid. This Bytes tool exposes all three formats (decimal/hex/binary) plus prefix options for output-format flexibility.
Related Tools
Convert Bytes to UTF-8 →
Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…
Binary to UTF-8 Decoder →
Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…
Convert Arbitrary Base to UTF-8 →
Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…
Base64 to UTF-8 Decoder →
Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…
Code Points to UTF-8 Converter Free →
Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…
Convert Data URI to UTF-8 →
online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…
Convert Decimal to UTF-8 →
online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…
Convert Hexadecimal to UTF-8 →
Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…
Convert HTML Entities to UTF-8 →
Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…
Convert Octal to UTF-8 →
Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…
Convert UTF-16 to UTF-8 →
Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…
Convert UTF-32 to UTF-8 →
Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…