Convert UTF-32 to UTF-8

In short

Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict surrogate / range validation, bidirectional. Free, client-side, secure.

Runs in your browser
Nothing uploaded
Free, no sign-up

Convert UTF-32 code points (decimal 4-byte groups, hex 8-digit, or 32-bit binary) into decoded text plus the resulting UTF-8 byte sequence. Strict validation: codepoints > U+10FFFF and surrogate U+D800-U+DFFF raise errors with token position - no silent replacement-character substitution.

UTF-32 code points input

Format

Byte order

Include BOM (reverse only)

Decoded text

UTF-8 bytes (hex)

Per-character breakdown

Type to begin.

🛡

100% PrivateNo server uploads, ever

⚡

InstantRuns in your browser

💧

No WatermarksClean output, always

🆓

Free ForeverNo accounts, no limits

How to Use Convert UTF-32 to UTF-8

Paste UTF-32 code points. Hex (e.g., 00000048) is most common; decimal 4-byte groups and 32-bit binary are also supported.
Pick byte order: Big-endian (network order, most files) or Little-endian (Windows / x86 memory dumps).
If the input starts with the UTF-32 BOM (00 00 FE FF BE or FF FE 00 00 LE), it's auto-stripped and noted in stats.
The output panel shows decoded text. Below it, the UTF-8 bytes panel shows the same text serialized as UTF-8 for end-to-end verification.
The grid shows each character's codepoint, Unicode plane (ASCII/BMP/SMP/SIP), and UTF-8 byte sequence.
Swap to reverse - text to UTF-32 code points in your chosen format and endianness.

Frequently Asked Questions

How does the decimal 4-byte format work?

Each UTF-32 code point is four bytes (0-255 each). BE: 0 0 0 72 = (0*16777216)+(0*65536)+(0*256)+72 = U+0048 = “H”. LE: 72 0 0 0 = same codepoint, low byte first. A common confusion: 72 0 0 0 only decodes to “H” with LE selected – with default BE it decodes to 0x48000000 which exceeds U+10FFFF and errors out.

How is this different from the existing UTF-16 → UTF-8 converter?

UTF-32 is fixed-width: each code point is one 32-bit unit. No surrogate pairs. So an emoji like 🌍 (U+1F30D) is a single UTF-32 unit 0001F30D, whereas in UTF-16 it would be two units D83C DF0D. This converter validates that each input value is within U+0000-U+10FFFF and is not a surrogate (U+D800-U+DFFF reserved for UTF-16).

What’s the UTF-32 BOM?

The codepoint U+0000FEFF prepended. Serialization: BE 00 00 FE FF, LE FF FE 00 00. Distinct from UTF-16 BOMs (2 bytes) so a reader can tell UTF-32 from UTF-16. This tool auto-strips on decode, lets you toggle on encode.

Why does this throw errors instead of using U+FFFD?

Silent U+FFFD substitution hides bugs. A real-world stream containing a value > U+10FFFF or a surrogate is corrupted at the source – the converter telling you “position N is invalid” is more useful than producing apparently-valid text that’s actually different from the input intent. The browser’s TextDecoder can’t do UTF-32 at all, so this tool implements the validation explicitly.

Why is UTF-32 even used?

Mostly internal representations where O(1) random access to characters matters more than memory: some text-processing libraries, older Python’s internal “wide” build (pre-3.3), niche academic / linguistic software. For storage and transmission, UTF-32 is rare because it wastes space – ASCII is 4× bloated.

How does UTF-8 byte count compare?

UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Arabic/Hebrew (U+0080-U+07FF), 3 for most CJK and rest of BMP (U+0800-U+FFFF), 4 for non-BMP including emoji (U+10000-U+10FFFF). So encoding the same text in UTF-32 vs UTF-8 always compresses for any ASCII-heavy content.

What’s the codepoint range?

U+0000-U+10FFFF (1,114,112 code points). UTF-32 has spare bits – it can technically encode up to 2^32-1 ≈ 4.3 billion – but Unicode caps at U+10FFFF because that’s the limit UTF-16 surrogate pairs can reach. Values above are rejected to preserve roundtripability with UTF-16.

What are the Unicode planes (ASCII/BMP/SMP/SIP)?

The grid labels each character by plane. ASCII: U+0000-U+007F (basic Latin). BMP: U+0000-U+FFFF (Basic Multilingual Plane, most modern scripts). SMP: U+10000-U+1FFFF (Supplementary Multilingual Plane – emoji, ancient scripts, music). SIP: U+20000-U+2FFFF (Supplementary Ideographic Plane – rare CJK). Higher planes (TIP, SSP) hold edge-case codepoints.

Is my data sent anywhere?

No. Parsing, validation, and UTF-8 encoding all run in your browser.

What’s the input cap?

200,000 characters. Above that, the converter rejects rather than freezing the tab.

Keep going

Related Tools

All Utf8 tools →

Embed this tool

Add this free tool to your website. Copy and paste the code:

<iframe src="https://alltoolsverse.com/tools/convert-utf32-to-utf8/?embed=1" width="100%" height="760" loading="lazy" style="max-width:900px;border:1px solid #e2e8f0;border-radius:12px" title="Convert UTF-32 to UTF-8"></iframe>
<p>Free tool: <a href="https://alltoolsverse.com/tools/convert-utf32-to-utf8/">Convert UTF-32 to UTF-8</a> by All Tools Verse</p>

Per-character breakdown

Related Tools

Convert UTF-8 to UTF-32 →

Binary to UTF-8 Decoder →

Convert Arbitrary Base to UTF-8 →

Base64 to UTF-8 Decoder →

Convert Bytes to UTF-8 →

Code Points to UTF-8 Converter Free →

Convert Data URI to UTF-8 →

Convert Decimal to UTF-8 →

Convert Hexadecimal to UTF-8 →

Convert HTML Entities to UTF-8 →

Convert Octal to UTF-8 →

Convert UTF-16 to UTF-8 →

Embed this tool