Convert UTF-8 to UTF-32
Convert UTF-8 text to UTF-32 code points (hex/decimal/binary, BE/LE, BOM). Bidirectional, validated. Free, client-side, instant, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert UTF-8 text to UTF-32 code points in 3 formats (hex 8-digit / decimal 4-byte / binary 32-bit) with endianness (BE/LE) and optional BOM (U+0000FEFF). UTF-32 is fixed-width: every codepoint is exactly one 32-bit unit, no surrogate pairs (the key difference from UTF-16).
Per-character breakdown
How to Use Convert UTF-8 to UTF-32
- Paste UTF-8 text. Each codepoint becomes exactly one 32-bit unit - no surrogate pairs.
- Pick a format: hex (8-digit zero-padded), decimal (4 bytes per codepoint), or binary (32 bits).
- Choose endianness. BE serializes the high byte first (matching network order); LE is common in Windows/x86 memory dumps.
- Toggle the BOM (U+0000FEFF) for explicit byte-order tagging in streams.
- Swap to decode. Format auto-detected. Values above U+10FFFF or in the surrogate range (U+D800-U+DFFF) throw with position.
Frequently Asked Questions
What’s UTF-32?
Fixed-width Unicode encoding: every codepoint is exactly one 32-bit unit (4 bytes). Sometimes called UCS-4. The trade-off vs UTF-8/UTF-16 is space (4 bytes per ASCII char vs 1-2) for simplicity (O(1) random access, no surrogate pairs).
How does it differ from UTF-16?
UTF-16 uses 2 or 4 bytes (surrogate pair for non-BMP); UTF-32 always uses 4 bytes. So 🌍 (U+1F30D) is ONE UTF-32 unit (0001F30D) but TWO UTF-16 units (D83C DF0D surrogate pair). The grid shows each character as a single UTF-32 entry regardless of complexity.
What’s the codepoint range?
U+0000 to U+10FFFF – 1,114,112 possible codepoints. UTF-32 has spare bits (it can technically hold up to 2^32), but Unicode caps at U+10FFFF because that’s UTF-16’s surrogate-pair limit. Values above are rejected.
What’s the UTF-32 BOM?
U+0000FEFF prepended. Serializes as 00 00 FE FF in BE or FF FE 00 00 in LE – distinct from UTF-16 BOMs (2 bytes) so a reader can tell UTF-32 streams from UTF-16. The decoder auto-strips on detect.
Why are surrogates rejected on decode?
U+D800-U+DFFF are reserved for UTF-16 encoding only. They don’t represent characters; encountering one in a UTF-32 stream means corruption. This tool throws explicit position-specific errors rather than silently substituting U+FFFD.
Why is UTF-32 rarely used for storage?
Pure space inefficiency. ASCII text is 4× the size in UTF-32 vs UTF-8. For storage, UTF-8 dominates (compact for Latin, ASCII-backward-compatible). UTF-32 is mainly used as an internal representation in some text-processing libraries and older Python “wide” builds (pre-3.3) where O(1) codepoint indexing matters.
How big is UTF-32 output vs UTF-8?
ASCII: UTF-32 is 4× UTF-8. Latin/Cyrillic: 2×. CJK: 33% larger (4 bytes vs 3). Emoji: equal (4 bytes in both).
Is text uploaded?
No. The conversion runs entirely in your browser – nothing is sent to a server, logged, or stored, and the tool keeps working offline once the page has loaded.
Input cap?
200,000 characters per run, roughly a short novel chapter. The cap keeps the browser responsive; for bigger jobs, split the text and run it in parts.
How does this compare to the UTF-32 to UTF-8 tool?
Inverse direction. That tool parses UTF-32 code points and produces text + UTF-8 bytes; this tool starts from UTF-8 text and produces UTF-32 code points. Both validate U+10FFFF + surrogate strictly.
Related Tools
Convert UTF-32 to UTF-8 →
Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict…
Binary to UTF-8 Decoder →
Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…
Convert Arbitrary Base to UTF-8 →
Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…
Base64 to UTF-8 Decoder →
Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…
Convert Bytes to UTF-8 →
Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…
Code Points to UTF-8 Converter Free →
Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…
Convert Data URI to UTF-8 →
online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…
Convert Decimal to UTF-8 →
online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…
Convert Hexadecimal to UTF-8 →
Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…
Convert HTML Entities to UTF-8 →
Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…
Convert Octal to UTF-8 →
Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…
Convert UTF-16 to UTF-8 →
Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…