UTF-8 Tools

Q: Why do emoji need 4 UTF-8 bytes?

Emoji sit in high Unicode planes above U+FFFF. Any code point from U+10000 to U+10FFFF encodes as 4 UTF-8 bytes. Code points up to U+007F need 1 byte (ASCII range), up to U+07FF need 2, up to U+FFFF need 3, and above that, 4.

Q: What is a BOM and why do the tools strip it?

A BOM (Byte Order Mark, 0xEF 0xBB 0xBF in UTF-8) is an optional 3-byte prefix that identifies a stream as UTF-8. Some tools emit it (Windows Notepad, some exporters); many systems treat the BOM as part of the text, which causes a stray character at the start. The decoders strip it automatically so the output is clean.

UTF-8 decoders handle the jobs where you have a text-encoded representation of Unicode bytes and you want the actual readable text back: a Base64 string that holds Greek or Chinese, a space-separated hex dump from xxd that contains emoji, a list of decimal code points you copied from a Wikipedia article. Five browser-based decoders, every one UTF-8 aware, every one correctly handling multi-byte sequences for emoji, CJK characters, accented letters, and combined marks. For the wider Unicode inspection work (normalization, spoofing detection, escaping), see the Unicode category.
What you can do with UTF-8 decoders

Decode a Base64 blob that wraps Unicode text (Greek, CJK, emoji, accents) into readable UTF-8: Base64 to UTF-8.
Convert a stream of 8-bit binary into UTF-8 text, auto-detecting whether each byte is a lead or continuation byte: Binary to UTF-8.
Decode decimal, hex, or binary byte values from a hex dump, an Authy export, or a packet capture into readable UTF-8: Bytes to UTF-8.
Render a list of Unicode code points (for example U+1F600 U+00E9 U+4E2D) into text plus the actual UTF-8 byte sequences that each character produces on disk: Code Points to UTF-8.
Decode a token string in any base from 2 to 36 as UTF-8 bytes: Arbitrary Base to UTF-8.

Reach for UTF-8 decoders when the input is an encoded representation of bytes (Base64, binary, decimal numbers) and the output should be human-readable Unicode text. If the input is ASCII-only and UTF-8 awareness is not needed, the ASCII tools are simpler and faster. For Unicode normalization, homoglyph detection, or escape-sequence work on text you already have as readable UTF-8, the Unicode category is the right place.
The UTF-8 toolkit

Tool
What it does
When to use

Base64 to UTF-8
Decodes Base64 (standard or URL-safe variant) and interprets the result as UTF-8 text. Strips UTF-8 BOM, handles emoji, CJK, Greek, Hebrew, combined marks.
Reading a JWT payload that contains non-English names, decoding an email subject line that was Base64-encoded per RFC 2047, or inspecting a webhook signature that wraps Unicode.

Binary to UTF-8
Splits a binary string into bytes and decodes as UTF-8. Counts replacement characters when continuation bytes are broken. Auto-strips BOM.
Decoding a binary-encoded message from a network capture, inspecting the actual bit pattern of a Unicode payload, or teaching how multi-byte UTF-8 sequences work.

Bytes to UTF-8
Accepts decimal, hex, or binary byte values separated by any delimiter. Auto-detects base. Decodes as UTF-8.
Pasting the hex output of xxd, the decimal byte list from a Python bytes() literal, or the Wireshark packet dump column straight into readable text.

Code Points to UTF-8
Takes a list of Unicode code points and returns both the rendered characters and the UTF-8 byte sequence (1 to 4 bytes per character).
Seeing why U+1F600 needs 4 bytes on disk while U+00E9 needs 2, or generating a test vector for a UTF-8 encoder you are writing.

Arbitrary Base to UTF-8
Parses numeric tokens in any base from 2 to 36 as UTF-8 bytes. Useful when a non-standard encoding was used upstream.
Decoding output from a custom encoder that emitted base-16, base-36, or some other radix that is not a standard variant.

How to choose the right UTF-8 decoder

If the input is a Base64 string (possibly URL-safe), use Base64 to UTF-8. The decoder auto-detects both standard and URL-safe variants.
If the input is a string of 0 and 1 from a binary dump, go to Binary to UTF-8. It handles both 8-bit byte splits and mixed-length chunks.
If the input is a list of byte numbers (decimal, hex, or binary) separated by commas, spaces, or newlines, use Bytes to UTF-8. The base auto-detection handles the common formats.
If the input is code points like U+1F600 or hex numbers representing characters (not bytes), use Code Points to UTF-8. The difference matters: U+1F600 is one code point but four UTF-8 bytes.
If the input uses an unusual base (not binary, decimal, hex, or Base64), the Arbitrary Base to UTF-8 tool accepts any radix from 2 to 36.

Rule of thumb: start by identifying what each unit in your input represents (a bit, a byte, a code point, or a Base64 character). That maps one-to-one to which decoder fits.
Frequently asked questions
Q: What is the difference between UTF-8 and Unicode?
Unicode is the catalog of characters and their code points (numbers). UTF-8 is one of several ways to encode those code points as bytes on disk or over a network. U+00E9 (the letter é) is the same Unicode code point everywhere; in UTF-8 it encodes as two bytes (0xC3 0xA9); in UTF-16 it encodes as two bytes in a different pattern. These tools decode UTF-8 byte sequences back to the code points they represent.
Q: Why do emoji need 4 UTF-8 bytes?
Emoji sit in high Unicode planes above U+FFFF. Any code point from U+10000 to U+10FFFF encodes as 4 UTF-8 bytes. Code points up to U+007F need 1 byte (ASCII range), up to U+07FF need 2, up to U+FFFF need 3, and above that, 4. Code Points to UTF-8 shows this breakdown explicitly.
Q: What is a BOM and why do the tools strip it?
A BOM (Byte Order Mark, 0xEF 0xBB 0xBF in UTF-8) is an optional 3-byte prefix that identifies a stream as UTF-8. Some tools emit it (Windows Notepad, some exporters); many systems treat the BOM as part of the text, which causes a stray character at the start. The decoders strip it automatically so the output is clean.
Q: What happens if the input is corrupted or not valid UTF-8?
The decoders replace invalid byte sequences with the Unicode replacement character (U+FFFD, shown as �). They count how many were replaced so you can tell how damaged the input is. A few replacements usually mean one corrupted byte; hundreds usually mean the input was not UTF-8 at all (maybe Latin-1 or Windows-1252).
Q: Are my inputs sent to a server?
No. All five decoders process everything in the browser. The input string never leaves the page. Safe for production tokens, private messages, or internal logs. The browser network tab shows zero requests during decoding.
Related categories
UTF-8 decoders sit alongside a few neighboring toolkits. The Unicode category handles the next step after decoding: normalization, homoglyph detection, escaping, centering, chunking. The binary and encoding category covers Base N conversions that do not end in UTF-8 text (for example Base32 to hex, or binary to octal). For ASCII-only work where UTF-8 awareness is overkill, the ASCII category has faster single-byte tools.

Code Points to UTF-8 Converter Free

Convert Bytes to UTF-8 Online Free Tool

Binary to UTF-8 Decoder Online Free

Base64 to UTF-8 Decoder Online Free

Convert Arbitrary Base to UTF-8