Chunkify Unicode Text
Split Unicode text into equal chunks with grapheme, code-point, or UTF-16 modes. Keeps emoji and ZWJ sequences intact. Free, client-side.
- Runs in your browser
- Nothing uploaded
- Free, no sign-up
Split text into equal chunks by graphemes, code points, or UTF-16 units - emoji, ZWJ families, flags, and combining marks stay intact by default.
How to Use Chunkify Unicode Text
- Paste or type text into the input box. Anything works - Latin, CJK, Arabic, Hebrew, emoji, ZWJ families, combining marks.
- Set the number of chunks. The tool distributes characters as evenly as possible - if the total doesn't divide evenly, the first few chunks get one extra unit.
- Pick a split mode: Graphemes (default, what human readers see - 👨👩👧 is one unit), Code points (handles surrogate pairs, may split ZWJ families), or UTF-16 code units (fastest but breaks surrogate pairs into halves).
- Watch the live output - 200 ms after you stop typing, the tool re-chunks and shows each chunk prefixed with
Chunk N:on its own line. - Read the stats: total units in the chosen mode, chunk count, the minimum and maximum size, and how many milliseconds it took.
- Press
Ctrl+Enter(⌘+Enteron Mac) to force a re-chunk - handy after a paste so the debounce doesn't matter. - Copy the output or Download a timestamped `.txt` with the chunked text inside.
Frequently asked questions
Is my text uploaded anywhere?
No. The chunker runs entirely in your browser. Text you paste never touches the network – no fetch, no XHR, no analytics on content.
What’s the difference between graphemes, code points, and UTF-16 units?
Graphemes are what humans see: 👨👩👧 is one grapheme. Code points are what Unicode defines: the same sequence is five code points. UTF-16 units are what JavaScript stores internally: the same sequence is eight UTF-16 units. Different questions, different answers – the tool lets you pick.
Will emoji stay intact?
In graphemes mode (default), yes – including skin-tone modifiers (👨🏾), ZWJ families (👨👩👧), and flags (🇬🇷). In code-points mode, single emoji stay intact but ZWJ families and flags can split. In UTF-16 mode, even a single 😀 can break into two halves.
Why is graphemes mode the default?
It’s the only mode that matches a human reader’s intuition. The tool uses Intl.Segmenter with granularity:'grapheme', which the Unicode standard defines for exactly this use case.
When would I pick code-points or UTF-16 mode?
Code-points mode when you need a fixed-size-per-character count and the text has no ZWJ sequences. UTF-16 mode when you are interoperating with APIs that count string.length (like Twitter’s old 140-UTF-16-unit limit) and want to match their view.
What happens if I pick more chunks than units?
The tool returns an error – you cannot split N units into more than N chunks. The error names the unit type (graphemes / code points / UTF-16 units) so you can check your assumption.
Does it preserve whitespace and newlines?
Yes. Whitespace is counted like any other unit and never stripped. Newlines inside the input are treated as normal characters and appear inside whatever chunk they fall into.
Can I chunk text in any language?
Yes – all Unicode text works: Latin, CJK (Chinese/Japanese/Korean), Arabic, Hebrew, Devanagari, Greek, Cyrillic, and every other writing system.
Is there a size limit?
No hard limit. The tool handles 10,000-character inputs in well under 30 ms on a modern laptop. Very long inputs in graphemes mode stay fast because Intl.Segmenter is native C++ code in the browser.
Does it work offline?
After the page loads, yes. HTML, CSS, and JS are self-contained – disconnect Wi-Fi and keep chunking.
Related Tools
Center Unicode Text →
Center Unicode text within a fixed width, with real grapheme counting for emoji and…
Check Spoofed Unicode Text →
Detect Unicode confusables and homoglyphs from Cyrillic, Greek, Armenian, and Hebrew that imitate Latin…
ASCII to Unicode Converter →
ASCII to Unicode & Decode decimal, hex, octal, or U+XXXX values to Unicode characters…
Convert Code Points to Unicode →
Convert Code Points to Unicode (U+XXXX, hex, decimal) to characters - handles emoji, CJK,…
Convert Unicode to ASCII →
Convert Unicode to ASCII with transliteration (é → e, ñ → n), replace, or…
Convert Unicode to Base64 →
Encode Unicode text to Base64 (and decode) with standard, URL-safe, MIME variants. UTF-8 proper.…
Convert Unicode to Binary →
Convert Unicode to binary in 3 modes (UTF-8, codepoint, UTF-16). Per-character breakdown. Free, offline,…
Convert Unicode to Bytes →
Convert Unicode to UTF-8 bytes in hex, decimal, or binary. Per-byte grid, reverse direction.…
Convert Unicode to Code Points →
Convert Unicode to code points (U+XXXX, HTML/CSS/JS escapes) and back. Per-character breakdown. Free, offline,…
Convert Unicode to Data URL →
Convert Unicode to data URLs with base64 or URL-encoding, 12 MIME types, charset toggle.…
Convert Unicode to Decimal →
Convert Unicode text to decimal code point values.
Convert Unicode to Hex →
Convert Unicode to hex codepoints with prefix/padding/case options (and back). Per-character breakdown. Free, offline,…