Chunkify Unicode Text Online Tool
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji and ZWJ sequences intact. Free, client-side.
Split text into equal chunks by graphemes, code points, or UTF-16 units β emoji, ZWJ families, flags, and combining marks stay intact by default.
How to Use Chunkify Unicode Text Online Tool
- Paste or type text into the input box. Anything works - Latin, CJK, Arabic, Hebrew, emoji, ZWJ families, combining marks.
- Set the number of chunks. The tool distributes characters as evenly as possible - if the total doesn't divide evenly, the first few chunks get one extra unit.
- Pick a split mode: Graphemes (default, what human readers see - π¨βπ©βπ§ is one unit), Code points (handles surrogate pairs, may split ZWJ families), or UTF-16 code units (fastest but breaks surrogate pairs into halves).
- Watch the live output - 200 ms after you stop typing, the tool re-chunks and shows each chunk prefixed with
Chunk N:on its own line. - Read the stats: total units in the chosen mode, chunk count, the minimum and maximum size, and how many milliseconds it took.
- Press
Ctrl+Enter(β+Enteron Mac) to force a re-chunk - handy after a paste so the debounce doesn't matter. - Copy the output or Download a timestamped `.txt` with the chunked text inside.
Frequently asked questions
Is my text uploaded anywhere?
No. The chunker runs entirely in your browser. Text you paste never touches the network – no fetch, no XHR, no analytics on content.
What’s the difference between graphemes, code points, and UTF-16 units?
Graphemes are what humans see: π¨βπ©βπ§ is one grapheme. Code points are what Unicode defines: the same sequence is five code points. UTF-16 units are what JavaScript stores internally: the same sequence is eight UTF-16 units. Different questions, different answers – the tool lets you pick.
Will emoji stay intact?
In graphemes mode (default), yes – including skin-tone modifiers (π¨πΎ), ZWJ families (π¨βπ©βπ§), and flags (π¬π·). In code-points mode, single emoji stay intact but ZWJ families and flags can split. In UTF-16 mode, even a single π can break into two halves.
Why is graphemes mode the default?
It’s the only mode that matches a human reader’s intuition. The tool uses Intl.Segmenter with granularity:'grapheme', which the Unicode standard defines for exactly this use case.
When would I pick code-points or UTF-16 mode?
Code-points mode when you need a fixed-size-per-character count and the text has no ZWJ sequences. UTF-16 mode when you are interoperating with APIs that count string.length (like Twitter’s old 140-UTF-16-unit limit) and want to match their view.
What happens if I pick more chunks than units?
The tool returns an error – you cannot split N units into more than N chunks. The error names the unit type (graphemes / code points / UTF-16 units) so you can check your assumption.
Does it preserve whitespace and newlines?
Yes. Whitespace is counted like any other unit and never stripped. Newlines inside the input are treated as normal characters and appear inside whatever chunk they fall into.
Can I chunk text in any language?
Yes – all Unicode text works: Latin, CJK (Chinese/Japanese/Korean), Arabic, Hebrew, Devanagari, Greek, Cyrillic, and every other writing system.
Is there a size limit?
No hard limit. The tool handles 10,000-character inputs in well under 30 ms on a modern laptop. Very long inputs in graphemes mode stay fast because Intl.Segmenter is native C++ code in the browser.
Does it work offline?
After the page loads, yes. HTML, CSS, and JS are self-contained – disconnect Wi-Fi and keep chunking.