Convert Unicode to Bytes
Convert Unicode to UTF-8 bytes in hex, decimal, or binary. Per-byte grid, reverse direction. Free, offline, client-side, instant, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert Unicode text to raw UTF-8 bytes in three representations (hex, decimal, binary) with optional 0x / \x / 0b prefixes for direct paste into C, Python, or Rust source. Per-byte grid maps each byte back to its source character. Reverse direction auto-detects the format.
Per-byte breakdown
How to Use Convert Unicode to Bytes
- Paste your text. Anything Unicode - ASCII, accented Latin, CJK, emoji, mathematical symbols. The encoder uses
TextEncoder.encode()to produce proper UTF-8 bytes (variable length: 1-4 bytes per character). - Pick the byte format. Hex (most common for debugging -
48 65 6C 6C 6F), decimal (good for human readability -72 101 108 108 111), or binary (educational, raw bit view -01001000 01100101). Switch formats and the same bytes are presented differently. - Choose a separator. Space (most readable, default), comma (CSV-compatible), newline (one byte per line for tall display), or none (continuous stream for hex pasting into a single value).
- Add a prefix per byte.
0xfor C/JavaScript/Python hex literals (0x48 0x65).xfor C/Python string literals (x48x65- combine with no separator).0bfor binary literals. None for raw bytes. Useful for pasting straight into source code as an array. - Toggle uppercase hex. Default uppercase (
4F) matches Windows hex editors and macOS Hex Fiend. Lowercase (4f) matches Unix tools (hexdump,xxd) and most C/Rust source code conventions. The hex values are mathematically identical. - Read the per-byte grid. One row per UTF-8 byte showing index, the character that byte belongs to (first byte gets the char glyph, continuation bytes get
↳), hex, decimal, and binary. Multi-byte characters appear as 2-4 consecutive rows. Useful for understanding exactly how UTF-8 encodes a specific character. - Swap to decode. ⇄ flips to Bytes → Unicode. Auto-detects format: hex (chars A-F or 2-digit length), decimal (digits only), or binary (only 0/1 and multiples of 8). Strips common prefixes automatically (
0x,x,0b). Strict UTF-8 decoder catches invalid byte sequences.
Frequently Asked Questions
How does UTF-8 byte count vary with content?
UTF-8 is variable-length: ASCII (codepoints 0-127) = 1 byte each, Latin Extended / Greek / Cyrillic = 2 bytes, most CJK and other BMP characters = 3 bytes, supplementary planes (emoji, rare ideographs) = 4 bytes. So “Hello” is 5 bytes, but “Héllo” is 6 (the é is 2 bytes), and “🌍 Hello” is 11 bytes (4 for emoji + 1 space + 5 for ASCII).
What’s the difference between the three formats?
Same bytes, different presentations. Hex (base 16) is the most compact 2-char-per-byte format – universal in debugging tools and hex editors. Decimal (base 10) is easier for humans to read but takes 3 chars per byte. Binary (base 2) shows raw bits – best for teaching how UTF-8 encodes characters (lead byte patterns, continuation bytes), but verbose at 8 chars per byte.
What’s the difference between 0x and x prefixes?
Both indicate hex but in different contexts. 0x is the C/JavaScript/Python/Rust prefix for hex numeric literals: int x = 0x48;. x is the C/Python prefix for hex escapes inside string literals: "x48x65x6C". Use x with no separator to get a paste-able string literal: x48x65x6Cx6Cx6F = “Hello” in C/Python.
How does the reverse direction detect the format?
By scanning the first byte token: binary if it’s 8 characters of 0s and 1s, decimal if it’s all digits and short (≤3), hex if it contains A-F or is exactly 2 characters. Prefixes (0x, x, 0b) are stripped before detection. Mixed-format input is not supported – each conversion treats the entire input as one format.
What does the “↳” symbol mean in the grid?
It marks continuation bytes of a multi-byte UTF-8 character. The first byte gets the actual character glyph; subsequent bytes (in 2-4 byte sequences) get ↳ to visually show they’re part of the same character. Useful for understanding how an emoji like 😀 maps to 4 specific bytes: the first row shows “😀” and the next three show “↳”.
Why is my output the same in different prefix settings sometimes?
Because not all prefixes apply to all formats. 0x only prefixes hex. x only prefixes hex (it’s an escape syntax). 0b only prefixes binary. If you select x with decimal format, the prefix is silently skipped because there’s no x escape for decimal in any programming language. The dropdown could be smarter about hiding incompatible options, but cross-checking gives you flexibility.
What’s the input size limit?
200,000 characters. The per-byte grid caps at 256 displayed rows (since emoji-heavy text expands quickly – 50 emoji is already 200 bytes), but the conversion processes the full input. Copy and Download give you the complete output.
What’s the difference vs the “Convert String to Hex” tool?
That sibling tool focuses on hex output specifically with a per-character grid. This tool offers three byte representations (hex/decimal/binary) and a per-byte grid (one row per byte, not per character). Use the sibling for clean hex output, use this when you need decimal or binary, or when you want to see exactly how bytes line up with their source characters.
Is my text uploaded anywhere?
No. All UTF-8 encoding, byte formatting, and decoding run in your browser using TextEncoder and TextDecoder. Open DevTools → Network and confirm zero requests fire – even when you Convert or Download. Safe for tokens, credentials, or any sensitive text.
Does it work offline?
Yes. Total bundle is about 18 KB. Load once, disconnect, keep using. Pure JavaScript using browser standard library – no remote dependencies. Useful for debugging encoding issues on airgapped systems or in offline environments.
Related Tools
Center Unicode Text →
Center Unicode text within a fixed width, with real grapheme counting for emoji and…
Check Spoofed Unicode Text →
Detect Unicode confusables and homoglyphs from Cyrillic, Greek, Armenian, and Hebrew that imitate Latin…
Chunkify Unicode Text →
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji…
ASCII to Unicode Converter →
ASCII to Unicode & Decode decimal, hex, octal, or U+XXXX values to Unicode characters…
Convert Code Points to Unicode →
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters - handles emoji, CJK,…
Convert Unicode to ASCII →
Convert Unicode to ASCII with transliteration (é → e, ñ → n), replace, or…
Convert Unicode to Base64 →
Encode Unicode text to Base64 (and decode) with standard, URL-safe, MIME variants. UTF-8 proper.…
Convert Unicode to Binary →
Convert Unicode to binary in 3 modes (UTF-8, codepoint, UTF-16). Per-character breakdown. Free, offline,…
Convert Unicode to Code Points →
Convert Unicode to code points (U+XXXX, HTML/CSS/JS escapes) and back. Per-character breakdown. Free, offline,…
Convert Unicode to Data URL →
Convert Unicode to data URLs with base64 or URL-encoding, 12 MIME types, charset toggle.…
Convert Unicode to Decimal →
Convert Unicode text to decimal code point values.
Convert Unicode to Hex →
Convert Unicode to hex codepoints with prefix/padding/case options (and back). Per-character breakdown. Free, offline,…