Extracting a Unicode range pulls out only the characters that fall between two code points, so you can isolate just the emoji, just the Greek letters, or just the digits in a block of mixed text. Unicode organizes characters into numbered blocks, which makes this filtering possible. This guide explains how Unicode ranges work, how to extract one from text, and a free tool for the job.
In this guide
How Unicode blocks work
Unicode assigns characters to contiguous numbered ranges called blocks. Basic Latin sits at the start, Greek and Cyrillic follow, and emoji live in high ranges past U+1F000. Because related characters share a range, you can select a category just by its code point bounds. The code point system this relies on is covered in our text encoding guide.
Extract a range from text
To extract a range, you keep only the characters whose code points fall between a low and high bound and drop the rest. The Unicode range extractor takes your text and a range and returns just the matching characters, so a paragraph mixing Latin, Greek, and emoji can be filtered down to any one of them. To find the exact bounds, list the code points as our code points guide shows.
Useful ranges to know
A few ranges come up often. Basic Latin covers the everyday English letters, digits, and punctuation in the low values. The digits 0 to 9 sit together, so isolating numbers is a tight range. Emoji cluster in high blocks, which is why you can strip or keep all emoji with one range. Knowing roughly where a script lives lets you build a filter without memorizing every code point.
When you need this
Range extraction is handy for cleaning and analyzing text: stripping emoji before storing a value, pulling only the digits from a messy string, separating mixed-script content, or auditing what character sets a document actually uses. It is a precise alternative to guesswork when you need to keep or remove a whole category of characters at once.
Related text operations
Range extraction pairs with other Unicode tools. Counting what you kept is covered in our character counting guide, and detecting suspicious mixed scripts is in our spoofed Unicode guide, since an unexpected character from a foreign range is exactly what a homoglyph check looks for.
Free tools used in this guide
Frequently asked questions
What is a Unicode range?
A span of code points between a low and high bound. Unicode groups related characters into these numbered blocks, such as Basic Latin or emoji.
How do I extract only emoji from text?
Keep the characters whose code points fall in the emoji blocks, which a Unicode range extractor does when you give it those bounds.
How do I find a character’s code point?
List the code points of the text, then read the U+ value for the character you care about to set your range bounds.
What can I use range extraction for?
Stripping emoji, pulling out digits, separating mixed scripts, and auditing which character sets a document uses.
Are characters in a script always in one range?
Mostly, though some scripts span more than one block, so a wide or combined range is occasionally needed.