UTF-8 stores each character as one to four bytes, and converting UTF-8 to bytes or code points shows exactly how a string is encoded under the hood. The letter A is a single byte, 65, while an emoji is four bytes, and its code point is a number like U+1F600. This guide explains the difference between bytes and code points, how to convert each way, and free tools for the job.
In this guide
Bytes versus code points
A code point is the number Unicode assigns to a character, written like U+0041 for A or U+1F600 for a smiley. A byte is a unit of storage, 0 to 255. UTF-8 is the rule that turns code points into bytes: small code points become one byte, larger ones become two, three, or four. So one character can be one code point but several bytes. Our text encoding guide covers how UTF-8, UTF-16, and UTF-32 differ.
UTF-8 to bytes
Converting UTF-8 to bytes shows the actual stored sequence. The character A is one byte, 65. An accented e is two bytes. An emoji is four. The UTF-8 to bytes converter lists every byte for a string, which is exactly what you need when a file size, a buffer, or a network frame is measured in bytes rather than characters.
UTF-8 to code points
Code points identify the characters regardless of how they are stored. The UTF-8 to code points converter gives the U+ number for each character, which is the right view when you care about which characters a string contains rather than its byte length. This distinction is why counting characters and counting bytes can give different answers for the same text.
Bytes back to text
Going the other way, a sequence of bytes is decoded back into characters using the same UTF-8 rules. The bytes to UTF-8 converter reassembles the multi-byte sequences into readable text, which is how a program turns a raw buffer back into a string. If a byte sequence is invalid UTF-8, the decode fails, which is a common source of garbled text.
Why this matters
The byte versus character gap causes real bugs. A database column sized in characters can overflow on multi-byte input, a substring cut at a byte boundary can split a character, and a length check can reject valid text. Seeing the bytes and code points of a string makes these issues obvious, and it is essential when debugging encoding problems or working with binary protocols that carry text.
Free converters used in this guide
Frequently asked questions
What is the difference between a byte and a code point?
A code point is the Unicode number for a character, while a byte is a unit of storage. UTF-8 encodes one code point as one to four bytes.
How many bytes is an emoji in UTF-8?
Most emoji are four bytes in UTF-8, even though they are a single code point and a single character on screen.
Why do character count and byte count differ?
Because multi-byte characters take several bytes each, so a string with accents or emoji has more bytes than characters.
What is a code point written as U+1F600?
It is the Unicode number for a character in hexadecimal, here a grinning face emoji, independent of how many bytes store it.
What happens with invalid UTF-8 bytes?
The decode fails or produces replacement characters, which is a common cause of garbled or broken text.