Messy lists are a universal tax: names pasted from a spreadsheet with stray spaces, emails with duplicates, items in random case and random order, separators that disagree. Cleaning them by hand is error-prone exactly because the errors are invisible, a trailing space looks like nothing and breaks everything. This pillar maps the whole cleanup toolkit, every tool free and in-browser, starting with the workhorse of list hygiene, the deduplicator.
In this guide
The cleanup pipeline: an order that works
Most messy lists clean up with the same four passes, in an order where each step helps the next:
- Normalize the pieces: trim whitespace, fix case with the case converter, drop empty lines with the empty-item remover. Matching and sorting both depend on this step having happened.
- Deduplicate, now that “Apple”, “apple”, and ” apple” actually look identical to the tool; the full method is in the duplicates guide.
- Order, alphabetically or otherwise, covered in the sorting guide.
- Verify with counts: the item counter before and after tells you exactly how many duplicates and empties were removed, which is the difference between believing the cleanup worked and knowing it did.
The invisible enemies
- Trailing and leading spaces, the classic: invisible on screen, fatal to matching. Every serious list tool offers trim-before-compare for this reason.
- Case differences: “[email protected]” and “[email protected]” are one address and two list items until case is normalized, a one-click job explained in the case conversion guide.
- Unicode lookalikes: the same accented character can be stored two ways (one composed character, or letter plus combining accent), identical to eyes and different to computers; café can fail to match café. Tools with Unicode normalization, like the deduplicator, fold these together; the Unicode character counter makes such differences visible when a string “looks right” but behaves wrong.
- Mixed separators: half the list newline-separated, half comma-separated, usually the residue of two copy-pastes. The separators section below is the cure.
The toolbox by job
- Order and arrangement: alphabetize, shuffle for random order, chunk to split into fixed-size groups.
- Duplicates and uniqueness: deduplicate, extract unique items with its singleton and duplicate modes, consecutive-repeat remover.
- Filtering: filter by text or by pattern, plus compare two lists for what is shared and what is missing.
- Decoration and structure: prefixes, suffixes, bullets, and columns.
- Counting and text stats: item counts and the character counter for length limits, with what counts as a word settled in the counting guide.
Every one of these runs in the browser on pasted text: nothing leaves your machine, which matters more than usual when the list is emails or customer names.
Separators: the list’s file format
A list is items plus a separator convention, newlines, commas, tabs, pipes, semicolons, and most “broken” lists are just lists read with the wrong convention. Two habits cover nearly everything. First, normalize early: convert whatever arrived into one-item-per-line with the separator changer, because newline-separated is the format every other tool reads unambiguously. Second, convert back only at the end, when the destination demands commas or tabs. The round trip costs two clicks and removes the entire class of half-comma-half-newline confusion; spreadsheet-shaped data with multiple columns is a different animal, handled by the CSV family starting at CSV to list.
Frequently asked questions
Why does my list have empty items after pasting?
Blank lines, often a final trailing newline, count as items to naive processing. Dropping empties is the standard first pass, and the before/after item count shows exactly how many there were.
Can these tools handle very large lists?
Tens of thousands of lines are routine in a browser; memory is the only real limit. Beyond a few hundred thousand items, you are doing data processing and a spreadsheet or script becomes the honest tool.
Do the tools change my original text?
No, everything works on the pasted copy and produces output you copy back out. The original, wherever it lives, is untouched, which also means you can experiment freely.
What order should I clean in if I only do one pass?
Trim, then dedupe, then sort: the single most valuable sequence. Trimming first makes the dedupe honest, and sorting last means the order survives every prior step.