Home Tools Blog About

Convert UTF-32 to UTF-8

In short

Convert UTF-32 code points to UTF-8 text and bytes. 3 formats, BE/LE, BOM, strict surrogate / range validation, bidirectional. Free, client-side, secure.

  • Runs in your browser
  • Nothing uploaded
  • Free, no sign-up

Convert UTF-32 code points (decimal 4-byte groups, hex 8-digit, or 32-bit binary) into decoded text plus the resulting UTF-8 byte sequence. Strict validation: codepoints > U+10FFFF and surrogate U+D800-U+DFFF raise errors with token position - no silent replacement-character substitution.

Per-character breakdown

Type to begin.
🛡
100% PrivateNo server uploads, ever
InstantRuns in your browser
💧
No WatermarksClean output, always
🆓
Free ForeverNo accounts, no limits

How to Use Convert UTF-32 to UTF-8

  1. Paste UTF-32 code points. Hex (e.g., 00000048) is most common; decimal 4-byte groups and 32-bit binary are also supported.
  2. Pick byte order: Big-endian (network order, most files) or Little-endian (Windows / x86 memory dumps).
  3. If the input starts with the UTF-32 BOM (00 00 FE FF BE or FF FE 00 00 LE), it's auto-stripped and noted in stats.
  4. The output panel shows decoded text. Below it, the UTF-8 bytes panel shows the same text serialized as UTF-8 for end-to-end verification.
  5. The grid shows each character's codepoint, Unicode plane (ASCII/BMP/SMP/SIP), and UTF-8 byte sequence.
  6. Swap to reverse - text to UTF-32 code points in your chosen format and endianness.

Frequently Asked Questions

How does the decimal 4-byte format work?

Each UTF-32 code point is four bytes (0-255 each). BE: 0 0 0 72 = (0*16777216)+(0*65536)+(0*256)+72 = U+0048 = “H”. LE: 72 0 0 0 = same codepoint, low byte first. A common confusion: 72 0 0 0 only decodes to “H” with LE selected – with default BE it decodes to 0x48000000 which exceeds U+10FFFF and errors out.

How is this different from the existing UTF-16 → UTF-8 converter?

UTF-32 is fixed-width: each code point is one 32-bit unit. No surrogate pairs. So an emoji like 🌍 (U+1F30D) is a single UTF-32 unit 0001F30D, whereas in UTF-16 it would be two units D83C DF0D. This converter validates that each input value is within U+0000-U+10FFFF and is not a surrogate (U+D800-U+DFFF reserved for UTF-16).

What’s the UTF-32 BOM?

The codepoint U+0000FEFF prepended. Serialization: BE 00 00 FE FF, LE FF FE 00 00. Distinct from UTF-16 BOMs (2 bytes) so a reader can tell UTF-32 from UTF-16. This tool auto-strips on decode, lets you toggle on encode.

Why does this throw errors instead of using U+FFFD?

Silent U+FFFD substitution hides bugs. A real-world stream containing a value > U+10FFFF or a surrogate is corrupted at the source – the converter telling you “position N is invalid” is more useful than producing apparently-valid text that’s actually different from the input intent. The browser’s TextDecoder can’t do UTF-32 at all, so this tool implements the validation explicitly.

Why is UTF-32 even used?

Mostly internal representations where O(1) random access to characters matters more than memory: some text-processing libraries, older Python’s internal “wide” build (pre-3.3), niche academic / linguistic software. For storage and transmission, UTF-32 is rare because it wastes space – ASCII is 4× bloated.

How does UTF-8 byte count compare?

UTF-8 uses 1 byte for ASCII (U+0000-U+007F), 2 for Latin/Cyrillic/Greek/Arabic/Hebrew (U+0080-U+07FF), 3 for most CJK and rest of BMP (U+0800-U+FFFF), 4 for non-BMP including emoji (U+10000-U+10FFFF). So encoding the same text in UTF-32 vs UTF-8 always compresses for any ASCII-heavy content.

What’s the codepoint range?

U+0000-U+10FFFF (1,114,112 code points). UTF-32 has spare bits – it can technically encode up to 2^32-1 ≈ 4.3 billion – but Unicode caps at U+10FFFF because that’s the limit UTF-16 surrogate pairs can reach. Values above are rejected to preserve roundtripability with UTF-16.

What are the Unicode planes (ASCII/BMP/SMP/SIP)?

The grid labels each character by plane. ASCII: U+0000-U+007F (basic Latin). BMP: U+0000-U+FFFF (Basic Multilingual Plane, most modern scripts). SMP: U+10000-U+1FFFF (Supplementary Multilingual Plane – emoji, ancient scripts, music). SIP: U+20000-U+2FFFF (Supplementary Ideographic Plane – rare CJK). Higher planes (TIP, SSP) hold edge-case codepoints.

Is my data sent anywhere?

No. Parsing, validation, and UTF-8 encoding all run in your browser.

What’s the input cap?

200,000 characters. Above that, the converter rejects rather than freezing the tab.

Keep going

Related Tools

All Utf8 tools →

Convert UTF-8 to UTF-32

Convert UTF-8 text to UTF-32 code points (hex/decimal/binary, BE/LE, BOM). Bidirectional, validated. Free, client-side,…

Binary to UTF-8 Decoder

Binary to UTF-8 Text Decoder handles emoji, CJK, accents, strips BOM, counts replacement chars.…

Convert Arbitrary Base to UTF-8

Decode numeric tokens in any base (2-36) as UTF-8 bytes - multi-byte emoji and…

Base64 to UTF-8 Decoder

Decode Base64 to UTF-8 text - handles emoji, CJK, BOM-stripping, URL-safe variants. Free, client-side,…

Convert Bytes to UTF-8

Convert Bytes to UTF-8 Decode decimal/hex/binary byte values to UTF-8 text - emoji, CJK,…

Code Points to UTF-8 Converter Free

Free online Unicode code points to UTF-8 converter. Shows actual UTF-8 byte sequences per…

Convert Data URI to UTF-8

online Data URI to UTF-8 decoder with byte-breakdown panel for emoji and CJK. Client-side,…

Convert Decimal to UTF-8

online decimal to UTF-8 text decoder. Byte-mode (raw UTF-8 bytes) and codepoint-mode. Client-side, instant,…

Convert Hexadecimal to UTF-8

Decode hex to UTF-8 text with byte-structural breakdown. Handles ASCII, Latin, CJK, emoji. Batch…

Convert HTML Entities to UTF-8

Decode HTML entities to UTF-8 with per-character byte breakdown. Named, decimal, hex. Free, offline,…

Convert Octal to UTF-8

Decode octal byte sequences to UTF-8 text, encode UTF-8 to octal. C-escape support, multi-byte.…

Convert UTF-16 to UTF-8

Convert UTF-16 code units to UTF-8 text and bytes. 3 formats, BE/LE, BOM, surrogate…

Share

Embed this tool

Add this free tool to your website. Copy and paste the code: