Convert Code Points to Unicode Online Free Tool
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters – handles emoji, CJK, supplementary planes. Free, client-side, instant, offline, secure.
Unicode tools handle the 154,998 characters Unicode 15.1 defines, converting between UTF-8, UTF-16, and UTF-32, normalizing combining marks, escaping non-ASCII for JSON, and catching homoglyph spoofing where Cyrillic а masquerades as Latin a. A forum moderator screening 200 new usernames an hour can paste each handle into a spoof checker and reject Pаypal, Amаzon, and Gооgle lookalikes in seconds. This category collects 54 free browser-based tools, every common code-point conversion, every NFC/NFD normalization, every zalgo cleanup. No sign-up, no uploads.
What you can do with Unicode Tools
Convert between UTF-8, UTF-16, UTF-32, and raw code points, convert-unicode-to-utf8, convert-unicode-to-utf16, and convert-unicode-to-code-points cover the byte-level encodings most bugs need.
Normalize composed vs. decomposed forms, normalize-unicode-text reconciles é (U+00E9) with é (e + U+0301 combining acute), the fix for most “equal strings comparing unequal” failures.
Detect and neutralize spoofed strings, check-spoofed-unicode-text flags Cyrillic and Greek lookalikes and unspoof-unicode-text rewrites them to pure Latin.
Reach for Unicode tools when the work is character-level, code points, combining marks, grapheme clusters. If the task is reshaping bytes (Base64, hex), Encoding Tools is closer. For case conversion, diffs, or slugs, Text Tools picks up from there.
The Unicode toolkit
Tool
What it does
When to use
Unicode to UTF-8
Encodes text into its UTF-8 byte sequence.
Debugging a latin1-column MySQL row storing café as café.
Unicode to UTF-16
Converts text to UTF-16 code units (JavaScript’s internal string format).
Counting surrogate pairs when JS length returns 2 for one emoji.
Unicode to Hex
Shows each character’s hex code point (U+XXXX).
Reporting a bug with the exact 💯 vs 💯 + VS-16 sequence.
Escape Unicode
Replaces non-ASCII characters with uXXXX escapes.
Embedding a German string in a JSON config that must stay ASCII-safe.
Unicode Escape Decoder
Reverses uXXXX sequences into rendered characters.
Reading a log4j stack trace that JSON-escaped every non-ASCII input.
Normalize Unicode Text
Applies NFC, NFD, NFKC, or NFKD normalization.
Fixing a macOS filename stored decomposed but queried composed.
Count Unicode Characters
Counts bytes, code points, and graphemes separately.
Measuring a tweet where 👨👩👧👦 is 1 grapheme but 7 code points.
Extract Unicode Graphemes
Splits a string into user-perceived characters.
Slicing emoji + ZWJ sequences without breaking a family emoji.
Check Spoofed Unicode Text
Flags Cyrillic, Greek, and confusable lookalikes in Latin context.
Screening 200 new forum usernames an hour for Pаypal-style impersonation.
Unspoof Unicode Text
Rewrites confusable scripts to their Latin equivalents.
Sanitizing a support-ticket subject before it hits the router regex.
Remove Combining Characters
Strips diacritics while keeping the base letter.
Slugifying naïve café into naive-cafe for a URL path.
Generate Unicode Text
Outputs stylized text using mathematical alphanumeric blocks.
Crafting a Twitter bio in 𝓯𝓪𝓷𝓬𝔂 cursive script.
Zalgo Text Generator
Stacks combining marks above and below letters.
Thumbnail text for a horror-game YouTube channel.
Remove Zalgo from Unicode
Strips stacked combining marks and recovers plain text.
Cleaning a scraped comment where a troll pasted glitch text.
URL Encode Unicode
Percent-encodes non-ASCII for safe URL use.
A query string with Arabic or CJK terms that must survive old HTTP servers.
Encoders & byte-level converters cover the transforms runtime errors usually need. convert-unicode-to-utf8 and convert-unicode-to-utf16 translate strings to on-the-wire bytes; convert-unicode-to-hex surfaces each character’s U+XXXX code point.
Normalizers & grapheme tools kill “equal strings don’t match” bugs. normalize-unicode-text reconciles NFC vs. NFD. extract-unicode-graphemes and count-unicode-characters split text the way users read it.
Security & cleanup tools catch what moderation misses. check-spoofed-unicode-text flags Cyrillic homoglyphs; remove-combining-characters strips accents for slugs; remove-zalgo-from-unicode cleans glitch text.
How to choose the right Unicode tool
Two strings look identical but comparison fails → normalize both with normalize-unicode-text.
A JS emoji returns length === 2 → count with extract-unicode-graphemes.
A new username looks suspicious → check it with check-spoofed-unicode-text.
A log line is full of u00e9-style escapes → decode with unicode-escape-decoder.
A slug has ñ or ü → strip the marks with remove-combining-characters.
The trade-off that trips people up is normalization form. NFC (composed) is what most databases, URLs, and search indexes expect, é as the single code point U+00E9. NFD (decomposed) is what macOS HFS+ and some iOS copy-paste flows produce, e + U+0301. Pick NFC for web I/O and storage, NFD only when a mac-native pipeline demands it.
Frequently asked questions
Q: What’s the difference between a character, a code point, and a grapheme?
A code point is a single Unicode number like U+1F600 (😀). A grapheme is what a human reads as one character, 👨👩👧👦 is 7 code points joined by zero-width joiners but one grapheme. “Character” is ambiguous; Python treats it as code point, JavaScript as UTF-16 code unit. For user-facing counts, use extract-unicode-graphemes.
Q: Why do two strings that look identical sometimes compare unequal?
Different normalization forms. A Mac keyboard may produce é as e + U+0301 combining acute (NFD), while copy-paste from a web page gives the single code point U+00E9 (NFC). Both look the same; neither matches the other in byte comparison. Run both through normalize-unicode-text with NFC and the check passes.
Q: How do I detect a Cyrillic or Greek lookalike impersonating a Latin string?
Run check-spoofed-unicode-text. It flags any character from a script that doesn’t match the surrounding majority, Cyrillic а (U+0430) inside a Latin word, Greek ο (U+03BF) swapped for o. Common targets include Pаypal, Amаzon, and Gооgle in phishing links and typo-squatted usernames.
Q: What does uXXXX mean in JSON and Python strings?
It’s a Unicode escape, u00e9 represents U+00E9 (é). JSON produces these when ensure_ascii=True is set. unicode-escape-decoder reverses them; escape-unicode produces them when a config file must stay pure ASCII.
Q: How do emoji break character counters?
A single emoji can span multiple code points and UTF-16 units. 😀 is one code point but two UTF-16 units, JavaScript’s "😀".length returns 2. A family emoji 👨👩👧👦 is 7 code points. For tweet limits and SMS segments, use count-unicode-characters in grapheme mode.
Q: UTF-8 vs. UTF-16 vs. UTF-32, which should I use?
UTF-8 uses 1-4 bytes per code point and dominates the web and Linux, over 98% of public web pages served in 2025 used it. UTF-16 is JavaScript’s internal encoding and the native format of Windows APIs. UTF-32 uses a fixed 4 bytes and is mainly internal. Store and transmit in UTF-8 via convert-unicode-to-utf8; reach for convert-unicode-to-utf16 only when debugging a JS surrogate-pair bug.
Q: How do I safely put Arabic, Chinese, or emoji in a URL?
Percent-encode them with url-encode-unicode. شاي becomes %D8%B4%D8%A7%D9%8A, the UTF-8 bytes as percent-encoded triplets. Browsers show the readable form but send the encoded version, so old proxies and CDN rules stay happy.
Related categories
Unicode work overlaps with three neighbors. Encoding Tools handle the byte-level envelope, Base64, hex, URL escaping, once the Unicode layer is settled. Text Tools take over for case conversion, diffs, and slug generation. Security Tools covers hashing and password work when spoof detection is part of account security.
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters – handles emoji, CJK, supplementary planes. Free, client-side, instant, offline, secure.
ASCII to Unicode & Decode decimal, hex, octal, or U+XXXX values to Unicode characters – emoji-safe via fromCodePoint. Free, client-side, instant, offline.
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji and ZWJ sequences intact. Free, client-side.
Detect Unicode confusables and homoglyphs from Cyrillic, Greek, Armenian, and Hebrew that imitate Latin letters. Free, client-side, instant.
Center Unicode text within a fixed width, with real grapheme counting for emoji and CJK width for monospace. Free, offline, client-side.
Add Combining Characters diacritical marks above, below, or through any text. Free, offline, client-side, instant and secure.