Deduplicate List

Remove duplicates with case-sensitivity, Unicode normalization, whitespace trim, skip-empty, and show-duplicates modes. Free, offline, client-side, instant, secure.

Remove duplicate items from any list with 3 matching options: trim whitespace (so "apple" and " apple" match), Unicode normalize (so café precomposed matches café decomposed), and skip empty items. Plus a "Show duplicates" mode that lists which items appeared more than once with their counts.

How to Use Deduplicate List

  1. Paste your list and pick the separator (newline default; also comma, semicolon, tab, pipe, space).
  2. Keep occurrence: First (default) keeps the earliest of each duplicate. Last keeps the latest.
  3. Case-sensitive: off by default - "Apple" and "apple" match. Turn on to treat them differently.
  4. Preserve original order: on by default. Off sorts alphabetically.
  5. Trim whitespace: NEW. Ignores leading/trailing spaces during compare - "apple" and " apple" match.
  6. Unicode normalize (NFC): NEW. Converts decomposed forms (e + ◌́) to composed (é) before compare. Essential for pasted text from PDFs.
  7. Skip empty items: NEW. Omits zero-length items (caused by consecutive separators or trailing newlines).
  8. Show duplicates instead: NEW. Output changes to a frequency report of items appearing 2+ times.
  9. Press Ctrl+Enter to recalculate. Live preview after 200 ms.

Frequently Asked Questions

What’s new in this version?

Three new comparison options plus a duplicate-finder mode. Trim whitespace: the original treated "apple" and " apple" (leading space) as different – a common source of “why aren’t these matching?” frustration. Now toggleable. Unicode normalize (NFC): café can be stored as 4 code points (precomposed, U+00E9) OR 5 code points (e + U+0301 combining acute). The original treated these as different; now the toggle normalizes both forms before compare. Skip empty items: consecutive separators (a,,b) or trailing newlines produced empty items that counted as duplicates of each other. Now skippable. Show duplicates mode: instead of the deduplicated list, output the frequency map of items appearing 2+ times, sorted by count descending. Plus structural fixes: CSS bloat removed (mobile breakpoint had identical rule duplications), brand indigo replacing mint green accent, 3-type toast CSS classes (was inline style.backgroundColor swap), Ctrl+Enter document-level, FAQ structure with dtf-faq-item wrappers, structured stats card.

First occurrence vs Last occurrence – when to use which?

First (default): keep the earliest appearance. For input [A, B, A, C][A, B, C]. Best when input order represents priority and earlier = more important (e.g., the first email a user gave you is their canonical one).

Why would I need Unicode normalize?

Same-looking text can have different bytes. PDFs and some macOS apps use decomposed forms (e + ◌́ = 2 code points for é); most web forms and modern editors use precomposed (é = 1 code point). If you paste from a PDF and from your CRM into the same list, the same name might appear as two distinct items. The NFC normalize option fixes this by composing both forms identically before compare. Same problem hits Korean (Hangul jamo vs syllables), Vietnamese, Devanagari, and many others.

Case-sensitivity gotcha for non-Latin scripts?

The case-insensitive toggle uses JavaScript’s .toLowerCase(), which follows Unicode default case mapping. Works for Latin and Greek; partially works for Cyrillic. Doesn’t apply for Chinese / Japanese / Korean / Arabic / Hebrew / Devanagari because those scripts don’t have letter case. For Turkish-specific dotted/dotless I (İ vs I), JavaScript’s default behavior doesn’t match Turkish locale rules – but that’s a rare edge.

What’s “Show duplicates” mode?

Instead of producing a deduplicated list, the output becomes a count of items that appeared 2+ times. Format: item × count, one per line, sorted by count descending. Useful for: finding the most-duplicated entries in a CSV column, auditing user-submitted forms for repeat entries, debugging “why does my deduplicator have so many results?” issues. If no duplicates exist, output says “(no duplicates found)”.

Are empty items always counted?

By default, yes. a,,b with comma separator produces 3 items: "a", "" (empty), "b". The empty one counts. With multiple empties (a,,,b → 4 items including 2 empties), the dedup reduces them to 1. Turn on Skip empty items to remove them entirely from both input and output. Whitespace-only items (e.g., " ") are NOT counted as empty unless Trim whitespace is also on.

What about CSV columns?

This tool treats each separator-bounded value as a single item. For CSV deduplication ON SPECIFIC COLUMNS (e.g., “remove rows where the email column is duplicated”), you’d need to: (a) split CSV by rows first; (b) extract the email column; (c) feed those values here; (d) cross-reference back. The sibling tool “Delete CSV Columns” plus a spreadsheet tool is typically faster for full-row dedup with column-based comparison. For single-column lists pasted from a spreadsheet, this tool works directly.

What’s the performance limit?

Set-based O(n) algorithm. Tested at 100,000 items in ~80-150ms on a typical desktop. Beyond ~500k items, browser DOM updates (rendering the output textarea) become the bottleneck. For genuinely huge datasets (millions of items), use a streaming tool or Python script.

Why does Show Duplicates use lowercase keys but display the first-found case?

Because comparisons happen on the lowercased/trimmed/normalized key (matches the “what counts as a duplicate” logic), but humans want to see the original-form item. We pick the FIRST occurrence’s display form. So if input is "ApplenapplenAPPLE" case-insensitive, the output shows Apple × 3 (first form, total count).

Is my data secure?

Yes. All processing happens in your browser. Your list never leaves your device. The download is generated in-memory and offered locally. Nothing is logged or sent anywhere.