UTF-16 and UTF-32 are two other ways to store the same Unicode characters that UTF-8 stores, differing only in how many bytes each character takes. UTF-32 uses a fixed four bytes per character, UTF-16 uses two or four, and UTF-8 uses one to four. This guide explains how the three relate, when you would convert between them, and free converters for each direction.
In this guide
Same characters, different storage
Every Unicode character has one code point, a number such as U+0041 for A. The UTF encodings are just different rules for turning that code point into bytes. The character never changes, only the byte layout does. Our text encoding guide is the pillar for this topic, and the byte side is covered in our UTF-8 to bytes guide.
What UTF-16 does
UTF-16 stores most common characters in two bytes and the rarer ones, such as many emoji, in four bytes using a surrogate pair. It is the internal string format of Windows, Java, and JavaScript, which is why string length in those environments can surprise you: an emoji counts as two units. The UTF-8 to UTF-16 converter shows the two-byte layout, and the UTF-16 to UTF-8 converter brings it back.
What UTF-32 does
UTF-32 spends a fixed four bytes on every character, with no variable length and no surrogate pairs. That makes indexing trivial, since character N is always at position N times four, but it wastes space, since plain English text becomes four times larger than in UTF-8. The UTF-8 to UTF-32 converter and the UTF-32 to UTF-8 converter move between the compact and fixed-width forms.
Converting between them
Because all three encode the same code points, converting is lossless: you decode from one encoding to the code points, then re-encode in the other. Nothing about the text is lost or changed, only the byte count. This is why a file can be saved as UTF-8 and opened as UTF-16 with no data loss, as long as the program knows which encoding it is reading.
Which one to use
UTF-8 wins for storage and the web, because it is compact for English and ASCII-compatible. UTF-16 is common in memory for languages and platforms that adopted it early. UTF-32 is rare in files but handy in code that needs fixed-width characters for simple indexing. For almost any file or web content, UTF-8 is the right default, and the others are encountered mainly when interfacing with systems that already use them.
Free converters used in this guide
Frequently asked questions
What is the difference between UTF-8, UTF-16, and UTF-32?
They encode the same characters with different byte counts: UTF-8 uses one to four bytes, UTF-16 uses two or four, and UTF-32 uses a fixed four.
Why is an emoji two units in JavaScript?
Because JavaScript strings are UTF-16, and an emoji outside the common range is stored as a surrogate pair of two units.
Is converting between UTF encodings lossless?
Yes. They all represent the same code points, so converting through the code points and back changes only the byte layout, not the text.
Which encoding should I use for files?
UTF-8 in almost all cases, because it is compact, ASCII-compatible, and the standard for the web.
Why does UTF-32 use more space?
Because it spends four bytes on every character regardless of size, so ASCII text is four times larger than in UTF-8.