Convert UTF-32 to UTF-8
Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict surrogate / range validation, bidirectional. Free, client-side, secure.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Convert UTF-32 code points (decimal 4-byte groups, hex 8-digit, or 32-bit binary) into decoded text plus the resulting UTF-8 byte sequence. Strict validation: codepoints > U+10FFFF and surrogate U+D800-U+DFFF raise errors with token position - no silent replacement-character substitution.
Per-character breakdown
How to Use Convert UTF-32 to UTF-8
- Paste UTF-32 code points. Hex (e.g.,
00000048) is most common; decimal 4-byte groups and 32-bit binary are also supported. - Pick byte order: Big-endian (network order, most files) or Little-endian (Windows / x86 memory dumps).
- If the input starts with the UTF-32 BOM (
00 00 FE FFBE orFF FE 00 00LE), it's auto-stripped and noted in stats. - The output panel shows decoded text. Below it, the UTF-8 bytes panel shows the same text serialized as UTF-8 for end-to-end verification.
- The grid shows each character's codepoint, Unicode plane (ASCII/BMP/SMP/SIP), and UTF-8 byte sequence.
- Swap to reverse - text to UTF-32 code points in your chosen format and endianness.
Frequently Asked Questions
How does the decimal 4-byte format work?
Each UTF-32 code point is four bytes (0-255 each). BE: 0 0 0 72 = (0*16777216)+(0*65536)+(0*256)+72 = U+0048 = “H”. LE: 72 0 0 0 = same codepoint, low byte first. A common confusion: 72 0 0 0 only decodes to “H” with LE selected – with default BE it decodes to 0x48000000 which exceeds U+10FFFF and errors out.
How is this different from the existing UTF-16 → UTF-8 converter?
UTF-32 is fixed-width: each code point is one 32-bit unit. No surrogate pairs. So an emoji like 🌍 (U+1F30D) is a single UTF-32 unit 0001F30D, whereas in UTF-16 it would be two units D83C DF0D. This converter validates that each input value is within U+0000-U+10FFFF and is not a surrogate (U+D800-U+DFFF reserved for UTF-16).
What’s the UTF-32 BOM?
The codepoint U+0000FEFF prepended. Serialization: BE 00 00 FE FF, LE FF FE 00 00. Distinct from UTF-16 BOMs (2 bytes) so a reader can tell UTF-32 from UTF-16. This tool auto-strips on decode, lets you toggle on encode.
Why does this throw errors instead of using U+FFFD?
Silent U+FFFD substitution hides bugs. A real-world stream containing a value > U+10FFFF or a surrogate is corrupted at the source – the converter telling you “position N is invalid” is more useful than producing apparently-valid text that’s actually different from the input intent. The browser’s TextDecoder can’t do UTF-32 at all, so this tool implements the validation explicitly.
Why is UTF-32 even used?
Mostly internal representations where O(1) random access to characters matters more than memory: some text-processing libraries, older Python’s internal “wide” build (pre-3.3), niche academic / linguistic software. For storage and transmission, UTF-32 is rare because it wastes space – ASCII is 4× bloated.
How does UTF-8 byte count compare?
UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Arabic/Hebrew (U+0080-U+07FF), 3 for most CJK and rest of BMP (U+0800-U+FFFF), 4 for non-BMP including emoji (U+10000-U+10FFFF). So encoding the same text in UTF-32 vs UTF-8 always compresses for any ASCII-heavy content.
What’s the codepoint range?
U+0000-U+10FFFF (1,114,112 code points). UTF-32 has spare bits – it can technically encode up to 2^32-1 ≈ 4.3 billion – but Unicode caps at U+10FFFF because that’s the limit UTF-16 surrogate pairs can reach. Values above are rejected to preserve roundtripability with UTF-16.
What are the Unicode planes (ASCII/BMP/SMP/SIP)?
The grid labels each character by plane. ASCII: U+0000-U+007F (basic Latin). BMP: U+0000-U+FFFF (Basic Multilingual Plane, most modern scripts). SMP: U+10000-U+1FFFF (Supplementary Multilingual Plane – emoji, ancient scripts, music). SIP: U+20000-U+2FFFF (Supplementary Ideographic Plane – rare CJK). Higher planes (TIP, SSP) hold edge-case codepoints.
Is my data sent anywhere?
No. Parsing, validation, and UTF-8 encoding all run in your browser.
What’s the input cap?
200,000 characters. Above that, the converter rejects rather than freezing the tab.
Related Tools
Convert UTF-8 to UTF-32 →
Convert UTF-8 text to UTF-32 code points (hex/decimal/binary, BE/LE, BOM). Bidirectional, validated. Free, client-side,…
Binary to UTF-8 Decoder →
Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…
Convert Arbitrary Base to UTF-8 →
Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…
Base64 to UTF-8 Decoder →
Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…
Convert Bytes to UTF-8 →
Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…
Code Points to UTF-8 Converter Free →
Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…
Convert Data URI to UTF-8 →
online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…
Convert Decimal to UTF-8 →
online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…
Convert Hexadecimal to UTF-8 →
Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…
Convert HTML Entities to UTF-8 →
Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…
Convert Octal to UTF-8 →
Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…
Convert UTF-16 to UTF-8 →
Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…