Email Extractor
Extract emails from text - regex match, dedupe (case-insensitive), separator options. Stats. Free, offline, client-side, instant, secure.
Paste text on the left, get email addresses on the right. Regex-based - not full RFC 5322 (see FAQ). Live, case-insensitive dedup, choose your output separator, get top-domain stats.
How to Use Email Extractor
- Paste text into the left textarea - anything: a webpage, a contact list, a log file.
- Pick the output separator (newline / comma / semicolon / space) so the result drops into your target tool cleanly.
- Toggle case sensitivity for the dedup pass. Off (default) treats
[email protected]and[email protected]as the same - useful for mail-list cleaning. - Toggle sort if you want the output alphabetised instead of first-seen order.
- Read the stats line: raw matches, unique emails, duplicates removed (with percent), and top 3 domains by frequency.
- Copy or download the result. Ctrl/Cmd + Enter re-runs.
Frequently Asked Questions
What’s the exact pattern?
It is: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,24}. Translation: a local part of common ASCII characters, an @ sign, an ASCII domain with letters/digits/dots/hyphens, then a final TLD of 2-24 ASCII letters. The TLD length cap rejects pathological trailing matches like [email protected] where the regex would otherwise gobble too much.
Does it support international (IDN) domains?
No. The regex is ASCII-only. Internationalised domains like foo@münchen.de or foo@日本.jp won’t be matched. The original FAQ claimed support for “international domains” – that wasn’t accurate. If you need IDN, run the input through Punycode normalisation first (so münchen.de becomes xn--mnchen-3ya.de) and then re-extract.
How does deduplication work?
The first occurrence of each email is kept; subsequent occurrences are skipped. Comparison is case-insensitive by default (email local-parts are case-insensitive in practice, even though the RFC permits case-sensitive local parts). Toggle “case sensitive” if you really want to keep [email protected] and [email protected] as separate entries.
What does the “top domains” stat show?
The 3 most frequent domains in the deduped output, with their counts. Useful for sniffing list quality – if you see gmail.com (847) dominating your “customer leads” list, that’s a signal you’ve scraped a lot of personal-mail addresses.
Will it pick up emails inside URLs or HTML attributes?
Sometimes. The regex is content-agnostic – if your input contains href="mailto:[email protected]", the email portion will match. If a URL has [email protected] in its query string, that match too. The tool deliberately does NOT try to strip surrounding markup; that lets it work on any text source. If you need only “real” emails, filter the output by domain or hand-review.
How fast is it on big input?
Linear in input length. 100 kB of text runs in well under 50 ms on a modern machine. The 100 ms input debounce keeps typing smooth even on very large pastes. For multi-megabyte inputs, the regex.match still finishes in well under a second.
Is anything sent to a server?
No. The page loads three static files (HTML, CSS, JS) and then runs entirely in your browser. You can disconnect from the internet after the page loads. No analytics, no tracking, no cookies.
Is this tool free?
Yes – free, unlimited, no signup, no watermark. Use the output in any context. Attribution to is appreciated but not required.