Home Tools Blog About

Convert UTF-8 to Arbitrary Base

In short

Convert UTF-8 text bytes into any base 2-36 (binary, octal, hex, custom). Bidirectional, prefix, padding, case. Free, client-side, instant, secure.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Encode each UTF-8 byte of your text in any base from 2 to 36. Use binary (base 2), octal (base 8), hex (base 16), or any radix in between - useful for protocol debugging, education, and unusual encoding schemes. Swap to decode tokens back to text.

Per-character breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert UTF-8 to Arbitrary Base

  1. Paste UTF-8 text. The tool runs TextEncoder to get the raw byte sequence.
  2. Pick a base from 2 to 36. Padding ensures every byte uses the same width (8 chars for base 2, 2 for base 16, 2 for base 36, etc.), so reverse decoding has no ambiguity.
  3. Choose case (uppercase A-F by default for bases > 10) and prefix (none, or common literals: 0b for 2, 0o for 8, 0x for 16).
  4. The per-character grid shows each input character's codepoint, UTF-8 bytes in hex, and the same bytes in your chosen base - useful for verifying multi-byte characters.
  5. Swap direction to decode: paste base tokens, get UTF-8 text back. The decoder uses TextDecoder('utf-8', {fatal: true}) so corrupt sequences throw rather than silently producing replacement characters.

Frequently Asked Questions

What does “arbitrary base” mean here?

Each individual UTF-8 byte (value 0-255) gets written in your chosen number base. Base 16 produces hex bytes (00-FF). Base 2 produces 8-bit binary. Base 36 packs each byte into 2 chars using digits 0-9 and letters A-Z. The tool is NOT changing the underlying bytes – only how they’re written.

Why is the base limited to 2-36?

JavaScript’s Number.prototype.toString(radix) supports 2-36 because that’s the alphabet of 0-9 (10 digits) + A-Z (26 letters). Above 36 you’d need to define your own alphabet (Base58, Base64, Base85 all use distinct character sets – those are separate tools).

How does padding work?

A byte’s max value (255) needs different widths in different bases: 8 digits in binary (255 = 11111111), 3 in octal (377), 2 in hex (FF), 2 in base 36 (73). Padding fills shorter byte representations with leading zeros so reverse decoding can split fixed-width tokens unambiguously. With padding off, byte 5 in base 2 is just 101 (ambiguous with byte 0x101 = 257 which can’t exist) and reverse needs explicit separators.

What about multi-byte UTF-8 characters?

Encoded byte-by-byte exactly as UTF-8 produces them. 🌍 (U+1F30D) is 4 UTF-8 bytes F0 9F 8C 8D in hex; in base 2 that’s four 8-bit tokens; in base 36 it’s 6Y 4F 4D 4D. The per-character grid shows the multi-byte expansion.

Does the reverse decoder validate UTF-8?

Yes – strictly. The tool runs the parsed bytes through TextDecoder('utf-8', {fatal: true}), so invalid sequences (truncated multi-byte chars, overlong encodings, bytes encoding surrogate codepoints) throw an explicit error rather than substituting U+FFFD replacement characters. If decoding fails, the message usually means you have the wrong base or corrupted input.

Why might decoding produce “byte value exceeds 255”?

Each token must be a single byte (0-255). If your token parses to a larger value in the chosen base – say FFF in hex = 4095, well above 255 – that’s a clue the input was meant for a different base, or wasn’t padded properly, so neighbouring tokens got merged.

What’s a good use case for non-standard bases like 7 or 23?

Mostly education (showing how base conversion generalises) or constraint games (encoding data in a system that only allows certain characters). Base 36 specifically packs bytes more compactly than hex – 2 chars per byte at most, same as hex, but with 26 extra symbols in the alphabet.

Is base 64 supported?

No – Base64 isn’t a mathematical radix; it’s a specific encoding spec (3 input bytes → 4 output chars from a 64-char alphabet including + / or URL-safe - _). For Base64, use the dedicated converter in this suite.

Is my text uploaded?

No. TextEncoder and TextDecoder run entirely in the browser. About 18 KB of code.

What’s the input cap?

200,000 characters. Tokens grow with smaller bases (base 2 produces 8× the chars), so this cap protects against tab-freezes on huge inputs.

Keep going

Related Tools

All Utf8 tools →

Convert Arbitrary Base to UTF-8

Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…

Binary to UTF-8 Decoder

Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…

Base64 to UTF-8 Decoder

Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…

Convert Bytes to UTF-8

Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…

Code Points to UTF-8 Converter Free

Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…

Convert Data URI to UTF-8

online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…

Convert Decimal to UTF-8

online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…

Convert Hexadecimal to UTF-8

Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…

Convert HTML Entities to UTF-8

Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…

Convert Octal to UTF-8

Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…

Convert UTF-16 to UTF-8

Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…

Convert UTF-32 to UTF-8

Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…

Share

Embed this tool

Add this free tool to your website. Copy and paste the code: