Convert HTML Entities to UTF-8
Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline, client-side, instant, secure.
Decode named (©), decimal (©), and hex (©) entities to full Unicode. Per-character breakdown shows 1/2/3/4-byte UTF-8 classification and flags supplementary-plane emoji.
How to Use Convert HTML Entities to UTF-8
- Paste text with entities - named (
©,—), decimal (©), or hex (©). The tool hands the input to a detached textarea, so you get the browser's full HTML5 entity table (2,200+ names) without shipping any dictionary. - Read the decoded output - full Unicode is preserved: emoji, CJK, mathematical symbols, combining marks, every code point renders natively. No stripping, no placeholders.
- Study the breakdown - each code point gets a row: the character, a
1B/2B/3B/4Bbyte-length tag, theU+XXXXcode point, and the raw UTF-8 hex bytes (e.g.,€ = U+20AC = E2 82 AC). Supplementary-plane characters (emoji, rare CJK) are labeled. - Check the stats - total entities found + N/D/H type split + character count + total UTF-8 byte count + BMP vs supplementary-plane split. Quick way to see if your input has emoji that'll inflate a UTF-8 payload.
- Copy or download - Copy writes the decoded text to the clipboard; Download saves
decoded-utf8.txt. Ctrl+Enter (⌘+Enter) triggers a recompute. 200 ms debounce on input keeps typing smooth. - Trust the safety - decoding uses a detached
<textarea>'s.value. That's the same decoder every browser ships, but it never runs your input as HTML.<script>alert(1)</script>stays as text.
Frequently Asked Questions
What’s the difference between this and the ASCII-category tool?
Both decode entities. The ASCII version has an option to replace non-ASCII decodes with [U+XXXX] placeholders (for strict ASCII pipelines). This UTF-8 version keeps the full Unicode output and adds per-character UTF-8 byte analysis – which character took 1, 2, 3, or 4 bytes.
Why would I care about UTF-8 byte length?
UTF-8 is variable-width: ASCII is 1 byte, Latin extended (©, é) is 2, most Asian scripts and symbols (€, 中) are 3, emoji and supplementary-plane characters are 4. If you’re sizing a database field, a tweet, or a fixed-width protocol, the character count lies – the byte count tells the truth.
Does it support emoji and supplementary-plane characters?
Yes. 😀 → 😀 (U+1F600, 4-byte UTF-8 F0 9F 98 80). JavaScript stores supplementary-plane characters as surrogate pairs internally (two UTF-16 units), but our breakdown uses code-point iteration so the emoji shows as ONE row with length 4, not two rows.
How are the UTF-8 bytes computed?
From the Unicode code point using the standard encoding rule: 0-0x7F → 1 byte, 0x80-0x7FF → 2 bytes (110xxxxx 10xxxxxx), 0x800-0xFFFF → 3 bytes (1110xxxx 10xxxxxx 10xxxxxx), 0x10000-0x10FFFF → 4 bytes. No fancy libraries – just bit-shifting and masking.
Is the decoding XSS-safe?
Yes. We write the raw input to a detached <textarea>‘s innerHTML, then read its .value. The textarea element never parses its contents as HTML – it just entity-decodes. <script> tags survive the round-trip as literal text, no execution.
What about unknown or malformed entities?
&bogus; survives verbatim in the output. The browser’s decoder drops unknown sequences back through, so you never lose data – you just see the same text you typed.
Can it decode double-encoded HTML?
One pass per decode. &copy; becomes ©. Paste that back in and decode again to get ©. Deliberate – auto-looping would break inputs where & is the intentional end state.
Is my input sent to a server?
No. Zero network requests. The browser’s built-in entity decoder plus a few JavaScript functions. Open DevTools → Network and watch nothing fire after the page loads. Safe for scraped pages, customer records, internal tooling.
Does it work offline?
Yes. The whole tool is under 20 KB of HTML+CSS+JS. Once loaded, disconnect Wi-Fi and keep decoding. Bookmark and use on air-gapped boxes.
How large an input can it handle?
Typical inputs decode in under 50 ms. 100 KB of entity-heavy HTML decodes in roughly 25 ms. The breakdown panel caps at 60 rows to keep DOM work fast; the output and stats always reflect the full decode.