JavaScript

Regular Expressions

16 Questions

There are two ways to create a regular expression in JavaScript. A regex literal uses forward slashes like /pattern/flags and is compiled at load time. The RegExp constructor takes a string pattern and is useful when you need to build patterns dynamically from variables.

// Literal — preferred for static patterns
const re1 = /hello/i;

// Constructor — for dynamic patterns
const word = 'hello';
const re2 = new RegExp(word, 'i');

// With flags
/abc/g;   // global — find all matches
/abc/i;   // case-insensitive
/abc/m;   // multiline (^ and $ match line boundaries)
/abc/s;   // dotAll (. matches newlines)
/abc/u;   // unicode

// Dynamic pattern with special chars — must escape
const userInput = 'price: $10';
// Use an escape function before creating RegExp from user input
const dynamicRe = new RegExp(escapeRegex(userInput));

Regex literals are preferred for static patterns because they are validated at parse time and have better performance. Use the RegExp constructor only when the pattern needs to be built dynamically, and remember to double-escape backslashes in constructor strings since they go through string parsing first.

Why it matters: Choosing between regex literal and constructor is not just style — it affects parse-time validation and performance. Using the wrong one causes hard-to-find bugs when backslashes aren't properly escaped in dynamic patterns.

Real applications: Static validation patterns (email, phone) use regex literals; search features where the user types a search term use the RegExp constructor; URL routers that generate patterns from string templates use the constructor with escaping.

Common mistakes: Forgetting to double-escape backslashes in constructor strings (new RegExp('\d+') vs /\d+/), not escaping user input before passing to RegExp constructor (regex injection vulnerability), and creating new RegExp instances inside loops instead of caching the compiled pattern.

These are the three main methods for working with regex matches. test() returns a simple boolean indicating if a match exists. exec() returns a detailed match array including captured groups and the match index. match() is a String method that behaves differently depending on the g flag.

const re = /(d{4})-(d{2})/;
const str = '2024-03 and 2025-06';

re.test(str);        // true
re.exec(str);        // ["2024-03", "2024", "03", index: 0]

str.match(re);       // ["2024-03", "2024", "03"] (first match)
str.match(/d{4}-d{2}/g); // ["2024-03", "2025-06"] (all matches)

// matchAll — iterator of all detailed matches
for (const m of str.matchAll(/(d{4})-(d{2})/g)) {
  console.log(m[1], m[2]); // "2024","03" then "2025","06"
}

// search() — returns index of first match
str.search(/d{4}/);  // 0
str.search(/xyz/);    // -1 (not found)

// split() with regex
'one, two,  three'.split(/,s*/); // ["one", "two", "three"]

Use test() when you only need a yes/no answer, exec() when you need detailed match info with groups, and matchAll() (ES2020) when you need all matches with full detail. Note that exec() with the g flag is stateful — it advances lastIndex on each call.

Why it matters: Using the wrong method leads to either incomplete results or subtle statefulness bugs. matchAll() returning an iterator instead of an array is a common source of confusion in modern code.

Real applications: Form validation (test()), extracting named capture groups from dates/URLs (exec()), finding all tag names in HTML strings (matchAll()), highlighting all occurrences in a text editor, and data extraction from structured log lines.

Common mistakes: Calling test() with a /g flag regex repeatedly and getting alternating true/false (lastIndex bug), not spreading matchAll() into an array before iterating twice, and using match() without the g flag when expecting all matches (only returns the first).

A practical email regex checks for valid characters before the @ symbol, a domain name after it, and a top-level domain (TLD) at the end. No single regex can fully validate email addresses per the RFC 5322 specification, but a reasonable pattern covers the vast majority of real-world email formats.

// Practical email validation
const emailRe = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$/;

emailRe.test('user@example.com');    // true
emailRe.test('a.b+tag@sub.co.uk');   // true
emailRe.test('missing@.com');         // false
emailRe.test('@no-local.com');        // false

// For production, prefer built-in validation
//  handles most cases
// Or use a well-tested library for strict validation

// Other common validations
const phoneRe = /^+?[1-9]d{1,14}$/;         // E.164 format
const urlRe = /^https?://[^s/$.?#].[^s]*$/; // basic URL
const hexColorRe = /^#([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})$/;

For production applications, rely on the HTML5 input type="email" validation or a well-tested validation library rather than writing your own regex. The true RFC-compliant email regex is extremely complex and impractical for most use cases.

Why it matters: Regex-based email validation is one of the most commonly attempted and commonly broken tasks in web development. Knowing when to use regex and when to delegate to the browser or a library is a mark of practical engineering judgment.

Real applications: Basic syntax validation for user-facing forms, extracting emails from unstructured text (log files, documents), filtering email lists, and building simple parsers for structured string formats like phone numbers and postal codes.

Common mistakes: Writing overly strict email regex that rejects valid addresses (e.g., user+tag@domain.co.uk), writing too-permissive patterns that accept invalid strings, validating format instead of deliverability (a regex can't tell if an email exists), and not anchoring with ^ and $ (matches a substring instead of the whole value).

Character classes define a set of characters to match at a single position. Built-in classes include \d (digit), \w (word character), \s (whitespace), and custom sets like [abc]. Quantifiers specify how many times a pattern should repeat: * (0 or more), + (1 or more), ? (0 or 1), and {n,m} (range).

// Character classes
/\d/.test('9');         // true (digit)
/\w+/.test('hello');    // true (word characters)
/\s/.test(' ');         // true (whitespace)
/[aeiou]/.test('e');    // true (vowel)
/[^0-9]/.test('a');     // true (NOT a digit)

// Quantifiers
/a{3}/.test('aaa');     // true (exactly 3)
/a{2,4}/.test('aaa');   // true (2 to 4)
/colou?r/.test('color'); // true (u is optional)
/\d+/.exec('abc123');   // ["123"] (one or more digits)

// Greedy vs Lazy quantifiers
'bold'.match(/<.*>/);   // ["bold"] greedy
'bold'.match(/<.*?>/);  // [""] lazy (minimal match)

By default, quantifiers are greedy — they match as much as possible. Adding ? after a quantifier makes it lazy (non-greedy), matching as little as possible. The negated character class [^...] matches any character NOT in the set, which is often more precise than lazy quantifiers.
Why it matters: Greedy vs lazy matching is one of the most common sources of regex bugs. Understanding the difference is essential for extracting data from HTML or structured text where you need to match the minimum possible content between delimiters.

Real applications: Extracting HTML tag content with <tag>(.*?)</tag> (lazy) instead of greedy, scraping structured text, matching JSON-like patterns, and building template engines that process {{...}} markers.

Common mistakes: Using greedy .* and accidentally consuming more than intended (matches across multiple tags), forgetting that ? after a quantifier makes it lazy — not optional, and using lazy quantifiers when a negated character class would be cleaner and more efficient.

Capturing groups use parentheses (pattern) to capture the matched text, which can then be accessed via back-references or the match result array. Non-capturing groups use (?:pattern) to group patterns for alternation or quantification without capturing, which improves performance when you do not need the matched text.
// Capturing groups const match = /(w+)@(w+).(w+)/.exec('user@site.com'); // match[1] = "user", match[2] = "site", match[3] = "com" // Non-capturing group — groups but doesn't capture /(?:https?)://(w+)/.exec('https://example'); // match[1] = "example" (only one capture) // Back-reference /(\w+) \1/.test('hello hello'); // true (\1 = first captured group) // Alternation in group /(?:cat|dog)s/.test('cats'); // true /(?:cat|dog)s/.test('dogs'); // true // Practical: extract parts of a URL const urlRe = /^(https?)://([^/]+)(/.*)?$/; const parts = urlRe.exec('https://example.com/path'); // parts[1] = "https", parts[2] = "example.com", parts[3] = "/path"
Use non-capturing groups when you only need grouping for logical structure (like alternation) but do not need to reference the matched text. This keeps the match result array cleaner and slightly improves regex engine performance in complex patterns.
Why it matters: Understanding capturing vs non-capturing groups is essential once you build multi-group patterns. Non-capturing groups keep match arrays from filling with unwanted captures, making the results easier to process.

Real applications: Grouping alternation without polluting capture groups ((?:jpg|png|gif)), building complex validation patterns with logical groupings, and writing named-capture-group patterns where unnamed captures would interfere with group numbering.

Common mistakes: Using capturing groups when non-capturing would suffice (bloats the match array), confusing (?:...) (non-capturing) with (?=...) (lookahead), and not knowing that non-capturing groups still apply quantifiers — (?:ab)+ matches "ababab".

Lookahead and lookbehind assertions are zero-width patterns that check what comes before or after a position without consuming any characters. Positive lookahead (?=...) asserts what follows, negative lookahead (?!...) asserts what does NOT follow. Lookbehind uses (?<=...) and (?<!...) for what precedes.
// Positive lookahead — followed by '100px'.match(/\d+(?=px)/); // ["100"] // Negative lookahead — NOT followed by '100em'.match(/\d+(?!px)/); // ["100"] // Positive lookbehind — preceded by '$50'.match(/(?<=\$)\d+/); // ["50"] // Negative lookbehind — NOT preceded by '€50'.match(/(?<!\$)\d+/); // ["50"] // Password: at least one digit and one uppercase /(?=.*\d)(?=.*[A-Z]).{8,}/.test('Pass1234'); // true // Add commas to numbers using lookahead '1234567'.replace(/\B(?=(d{3})+(?!d))/g, ','); // "1,234,567"
Lookaheads and lookbehinds are zero-width — they assert a condition at a position but do not consume characters or advance the match position. This makes them perfect for password validation (multiple conditions at the same position) and number formatting. Lookbehind support was added in ES2018.
Why it matters: Lookaheads/lookbehinds enable "contextual matching" — matching a pattern only when it's followed by or preceded by something specific. This is used in countless production validation patterns and text processing tasks.

Real applications: Password strength validation (must contain uppercase, digit, special char), number formatting with thousands separators, context-sensitive replacements (only replace a word when not preceded by another specific word), and parsing tokens with context constraints.

Common mistakes: Using a lookbehind in environments that don't support ES2018 (Safari < 16.4), confusing positive lookahead (?=...) with non-capturing group (?:...), and forgetting that lookaheads and lookbehinds don't consume characters — the main pattern still needs to match the desired text.

The String.replace() method combined with regex is powerful for text transformations. You can reference capture groups in the replacement string using $1, $2, etc., or pass a callback function for dynamic replacements. Use the g flag to replace all occurrences, not just the first.
// Simple replace 'hello world'.replace(/world/, 'JS'); // "hello JS" // Global replace 'aabba'.replace(/a/g, 'x'); // "xxbbx" // Using capture groups '2024-03-15'.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1'); // "03/15/2024" // Callback function 'hello'.replace(/./g, (char, i) => { return i % 2 === 0 ? char.toUpperCase() : char; }); // "HeLlO" // replaceAll (ES2021) — no g flag needed 'aabba'.replaceAll('a', 'x'); // "xxbbx" // Named group references in replacement '2024-03-15'.replace( /(?\d{4})-(?\d{2})-(?\d{2})/, '$/$/$' ); // "03/15/2024"
The callback function receives the full match, each captured group, the match offset, and the original string as parameters. Use replaceAll() (ES2021) as a cleaner alternative to replace() with the g flag for simple string replacements.
Why it matters: The replace callback unlocks powerful transformation capabilities — not just substitution but computation on matched text. It's how template engines, code formatters, and text sanitizers work internally.

Real applications: Converting camelCase to kebab-case, escaping HTML entities in user input, transforming markdown syntax to HTML, replacing placeholders in templates with computed values, and normalizing date/phone formats across multiple formats.

Common mistakes: Using replace() without the /g flag and only replacing the first match, returning undefined from the replace callback (inserts "undefined" as the replacement string), forgetting that replaceAll() requires a string pattern (not a regex without the g flag), and not capturing groups that the callback references.

Regex flags modify how the pattern engine operates. The most common are g (global — find all matches), i (case-insensitive), m (multiline — anchors match line boundaries), and s (dotAll — dot matches newlines). Newer flags include u (unicode), d (indices), and v (unicodeSets).
// g — find all matches, not just the first 'abab'.match(/a/g); // ["a", "a"] // i — case-insensitive /hello/i.test('Hello'); // true // m — ^ and $ match line boundaries 'line1\nline2'.match(/^line/gm); // ["line", "line"] // s — dot matches newline /a.b/s.test('a\nb'); // true (without s: false) // u — proper unicode support /\u{1F600}/u.test('😀'); // true // d — match indices /a(b)/.exec('ab'); // no indices /a(b)/d.exec('ab'); // includes .indices property // Combining flags /pattern/gims; // global, case-insensitive, multiline, dotAll
The u (unicode) flag enables proper handling of Unicode characters beyond the Basic Multilingual Plane (like emoji). The d (hasIndices) flag adds start and end positions for each captured group in the match result. Always use the u flag when working with international text.
Why it matters: Regex without the u flag operates on UTF-16 code units, causing incorrect behavior with emoji and characters outside the BMP (like many Chinese/Japanese characters). This is a common internationalization bug.

Real applications: Validating usernames that allow international characters, processing multilingual text, matching emoji in social media content moderation, and building international character-aware word boundaries for text search.

Common mistakes: Not using /u flag with Unicode patterns (causes emoji to count as 2 characters), writing regex with the s flag expecting . to match newlines (it doesn't by default — /s is dotAll mode), and combining incompatible flags like u and v (ES2024 v supersedes u).

Named capturing groups use the syntax (?<name>pattern) to assign meaningful names to captured groups, making regex results much more readable than numeric indices. Access named captures via match.groups.name in code or
lt;name> in replacement strings.
const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/; const match = dateRe.exec('2024-03-15'); console.log(match.groups.year); // "2024" console.log(match.groups.month); // "03" console.log(match.groups.day); // "15" // Named groups in replace '2024-03-15'.replace(dateRe, ' lt;day>/ lt;month>/lt;year>'); // "15/03/2024" // Destructuring named groups const { groups: { year, month, day } } = dateRe.exec('2024-03-15'); console.log(year, month, day); // "2024" "03" "15" // Named back-reference /(?\w+) \k/.test('hello hello'); // true // \k references the named group
Named groups are especially useful for complex patterns where numbered references like $1 and $2 become confusing. They make the regex self-documenting and the code that processes matches easier to understand. Named groups can be combined with destructuring for clean variable extraction.
Why it matters: Named capturing groups transform regex results from cryptic arrays to self-documenting objects. When your pattern has more than 2-3 groups, named groups are essential for maintainability — especially when patterns change and group numbers shift.

Real applications: Parsing date/time strings (year, month, day), extracting URL components (protocol, host, path, query), parsing log line formats, and building template substitution engines with named placeholder replacement.

Common mistakes: Using positional groups (e.g., $1, $2) in complex patterns (breaks when groups are added/reordered), not knowing you can destructure .groups directly, and forgetting that named back-references use \k<name> syntax within the pattern itself.

The String.split() method accepts a regex pattern to split strings on complex delimiters. This is commonly used for tokenizing text, parsing CSV data, or splitting on multiple delimiter types simultaneously.
// Split on multiple delimiters 'one, two; three four'.split(/[,;\s]+/); // ["one", "two", "three", "four"] // Split and keep the delimiter (capturing group) 'hello123world456end'.split(/(\d+)/); // ["hello", "123", "world", "456", "end"] // Tokenize simple expressions '3 + 5 * 2 - 1'.split(/\s*([+\-*/])\s*/); // ["3", "+", "5", "*", "2", "-", "1"] // Parse CSV line (handles quoted values) const csvLine = 'name,"city, state",age'; const csvRe = /,(?=(?:[^"]*"[^"]*")*[^"]*$)/; csvLine.split(csvRe); // ["name", '"city, state"', "age"] // Limit splits 'a-b-c-d'.split(/-/, 2); // ["a", "b"]
When the regex contains a capturing group, the captured delimiters are included in the result array. This is useful when you need to preserve the separators, like when tokenizing mathematical expressions. Use the second argument of split() to limit the number of result pieces.
Why it matters: Regex-based splitting handles complex real-world data that can't be separated by a simple fixed string. CSV parsing, log tokenization, and expression parsing all require this capability.

Real applications: Parsing CSV with quoted fields, tokenizing arithmetic expressions for evaluation, splitting markdown text while preserving code blocks, processing multi-delimiter data exports (tab/comma/semicolon), and building simple lexers for custom configuration languages.

Common mistakes: Using split('.') instead of split(/\./) when splitting on a literal dot (unescaped dot matches any character), not accounting for capturing groups including delimiters in the result array, and using an unlimited split when only the first N parts are needed.

The lastIndex property tracks where the regex engine will start its next search when using the g (global) or y (sticky) flag. After each successful match, lastIndex is updated to the position after the match. This makes regex with the g flag stateful, which can cause unexpected behavior if reused.
const re = /\d+/g; // exec advances lastIndex on each call re.exec('abc 123 def 456'); // ["123"], lastIndex = 7 re.exec('abc 123 def 456'); // ["456"], lastIndex = 15 re.exec('abc 123 def 456'); // null, lastIndex = 0 // Common bug: reusing regex with g flag const re2 = /hello/g; re2.test('hello world'); // true, lastIndex = 5 re2.test('hello world'); // false! starts from index 5 // Fix: reset lastIndex re2.lastIndex = 0; re2.test('hello world'); // true again // Sticky flag (y) — must match at exactly lastIndex const sticky = /\d+/y; sticky.lastIndex = 4; sticky.exec('abc 123'); // ["123"] — match at index 4 sticky.exec('abc 123'); // null — no match at index 7
The y (sticky) flag is stricter than g — it requires the match to occur exactly at lastIndex, not just anywhere after it. Always reset lastIndex to 0 before reusing a global regex, or create a new regex instance each time. Using matchAll() or match() avoids this issue entirely.
Why it matters: The stateful lastIndex bug with reused global regex is one of the most surprising JavaScript gotchas. It causes tests to pass and fail alternately with no apparent reason, and is notoriously hard to debug.

Real applications: Any code that stores regex patterns as module-level constants and reuses them with .test() or .exec() is vulnerable. This affects validation modules, search utilities, and any performance-optimized code that avoids creating new regex instances.

Common mistakes: Declaring const re = /pattern/g at module scope and calling re.test() in a loop (alternating results), not knowing the sticky /y flag has the same issue, and forgetting to reset lastIndex between test calls in unit tests that reuse the same regex instance.

Special characters in regex like . * + ? ^ $ { } ( ) | [ ] \ have special meanings and must be escaped with a backslash to match them literally. When building patterns dynamically from user input, you must escape these characters to prevent regex injection or syntax errors.
// Special chars need escaping to match literally /1\.5/.test('1.5'); // true (escaped dot) /1.5/.test('1X5'); // true (unescaped dot matches any char) /\$10/.test('$10'); // true (escaped dollar sign) /$hello$/.test('(hello)'); // true // Escape function for dynamic patterns function escapeRegex(str) { // Replaces each special regex char with a backslash prefix var specials = /[.\*+?^$|(){}[\]\\-]/g; return str.replace(specials, '\\'); } // Usage with user input const userSearch = 'price: $10.00 (USD)'; const escaped = escapeRegex(userSearch); const re = new RegExp(escaped); re.test('The price: $10.00 (USD) is final'); // true // Without escaping, special chars cause errors or wrong matches // new RegExp('$10.00'); // matches "10X00" at end of string
Always use an escape function when building regex patterns from user input or dynamic strings. Without escaping, characters like . will match any character, $ will match end-of-string, and unmatched parentheses will throw a SyntaxError.
Why it matters: Building regex from user input without escaping is a security vulnerability known as ReDoS (Regular Expression Denial of Service) and can also cause incorrect matching. This is an OWASP-recognized attack vector for Node.js servers.

Real applications: Search-as-you-type features that highlight matches in text, building dynamic find/replace tools, constructing regex from config files or database values, and any user-facing feature that translates user text into a search pattern.

Common mistakes: Forgetting to escape user input before new RegExp(userInput) (security vulnerability + potential crash), not knowing that String.prototype.escapeRegex doesn't exist (must implement or import), and escaping the string but forgetting the g flag when replacing all occurrences.

Regular expressions are widely used for validating user input like phone numbers, passwords, credit cards, and dates. Combine anchors (^ and $) to ensure the entire string matches, and use lookaheads to enforce multiple conditions simultaneously without consuming characters.
// Password validation — multiple conditions const passwordRe = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%]).{8,}$/; passwordRe.test('Str0ng!Pass'); // true passwordRe.test('weakpass'); // false // Phone number (international format) const phoneRe = /^\+?[1-9]\d{1,14}$/; phoneRe.test('+919876543210'); // true // Date format (YYYY-MM-DD) const dateRe = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/; dateRe.test('2024-03-15'); // true dateRe.test('2024-13-01'); // false // IP address const ipRe = /^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/; ipRe.test('192.168.1.1'); // true ipRe.test('256.1.1.1'); // false // Username (alphanumeric, 3-16 chars) const usernameRe = /^[a-zA-Z0-9_]{3,16}$/;
Always use ^ (start) and $ (end) anchors for validation to ensure the entire string matches the pattern, not just a substring. For complex validations, consider combining simple regex checks with JavaScript logic for better readability and maintainability.
Why it matters: Without anchors, a validation regex will happily pass strings that contain the pattern anywhere — a critical security flaw. /\d{5}/ passes "abc12345xyz" as a valid ZIP code. Anchors are the most overlooked regex best practice.

Real applications: ZIP/postal code validation, password strength checking, username format enforcement, credit card number format validation, and any input field with strict format requirements.

Common mistakes: Forgetting ^/$ anchors (validates substring instead of full input), using one large complex regex instead of separate readable checks (hard to maintain and explain errors to users), and not considering edge cases like empty strings, international characters, or whitespace at start/end.

While JavaScript does not natively support atomic groups or possessive quantifiers, understanding them helps explain catastrophic backtracking — a performance issue where the regex engine tries exponentially many combinations. You can simulate atomic behavior using lookaheads in some cases.
// Catastrophic backtracking example // This regex is very slow on non-matching input: // /^(a+)+$/.test('aaaaaaaaaaaaaaaaX') // The engine tries every possible way to split the a's // Safe alternative — avoid nested quantifiers /^a+$/.test('aaaaaaaaaaaaaaaaX'); // fast — single quantifier // Other problematic patterns: // /(a|b)*$/ on "aaaaaaaX" // /(a+b)+$/ on "aaaaaaaX" // Prevention strategies: // 1. Avoid nested quantifiers: (a+)+ → a+ // 2. Use specific character classes: .* → [^\n]* // 3. Make patterns more specific // 4. Use possessive quantifiers in other languages: a++ // Atomic group simulation with lookahead // (?=(pattern))\1 — captures in lookahead, then matches /(?=(a+))\1b/.test('aaab'); // works like atomic group
Catastrophic backtracking most commonly occurs with nested quantifiers like (a+)+ or alternation combined with quantifiers. To avoid it, prefer specific character classes over ., avoid nesting quantifiers unnecessarily, and test your regex with both matching and non-matching inputs to verify performance.
Why it matters: ReDoS (Regex Denial of Service) is a real OWASP-listed vulnerability. A single malicious input string can hang a Node.js server indefinitely if catastrophic backtracking is triggered. This has caused major outages (Cloudflare 2019 outage was caused by a ReDoS regex).

Real applications: Any web server that validates user input against regex (URL validation, email validation, markdown parsing) is a potential target. Security-reviewed production systems use tools like safe-regex or vuln-regex-detector to audit patterns.

Common mistakes: Writing email validation regex with exponential backtracking potential, not testing regex performance against adversarial inputs (e.g., 50 'a' characters followed by a non-matching character), and deploying patterns with nested quantifiers like (a+)+ in user-facing endpoints.

The u (unicode) flag enables proper handling of Unicode characters including emoji, Chinese characters, and other multi-byte symbols. Without the u flag, JavaScript treats strings as sequences of 16-bit code units, which can cause incorrect matches for characters outside the Basic Multilingual Plane.
// Without u flag — broken Unicode handling /^.$/.test('😀'); // false (emoji is 2 code units) /^..$/.test('😀'); // true (treated as 2 chars) // With u flag — correct Unicode handling /^.$/u.test('😀'); // true (emoji is 1 character) // Unicode property escapes (requires u flag) /\p{Letter}/u.test('ñ'); // true (any letter) /\p{Number}/u.test('①'); // true (any number) /\p{Emoji}/u.test('🎉'); // true /\p{Script=Han}/u.test('中'); // true (Chinese characters) // Unicode-aware word boundary /\b\w+\b/u; // correct word boundaries with unicode // v flag (ES2024) — extends unicode support /[\p{Letter}&&[^\p{Script=Latin}]]/v; // set intersection /[\p{Emoji}--[😀]]/v; // set subtraction
Always use the u flag when working with international text or emoji. The v (unicodeSets) flag (ES2024) adds set operations like intersection (&&) and subtraction (--) within character classes, enabling more precise Unicode matching patterns.
Why it matters: Unicode-aware regex is a prerequisite for any application serving international users. Without /u, emoji break string length calculations and character class matching produces wrong results for non-Latin scripts.

Real applications: Username validation that allows international characters, emoji filtering in content moderation, searching across multilingual content, building regex for non-Latin text processing (Arabic, CJK, Cyrillic), and emoji-aware string splitting.

Common mistakes: Not using /u when matching characters above U+FFFF (emoji are matched as two code units without it), using . to match any character including newlines without the /s flag, and confusing the ES2024 /v flag (unicodeSets — for advanced Unicode) with the older /u flag.

Web developers frequently use regex for data extraction, text transformation, and input sanitization. Here are some of the most commonly used patterns for everyday web development tasks like parsing URLs, cleaning user input, and extracting data from strings.
// Extract all URLs from text const urlRe = /https?:\/\/[^\s<>]+/g; text.match(urlRe); // Remove HTML tags const stripTags = /<[^>]*>/g; 'Hello World'.replace(stripTags, ''); // "Hello World" // Trim whitespace (beyond String.trim) str.replace(/^\s+|\s+$/g, ''); // trim both ends str.replace(/\s+/g, ' '); // collapse whitespace // Extract hashtags '#hello world #coding'.match(/#\w+/g); // ["#hello", "#coding"] // camelCase to kebab-case 'camelCaseText'.replace(/([A-Z])/g, '-$1').toLowerCase(); // "camel-case-text" // Mask sensitive data 'Card: 4532-1234-5678-9012'.replace(/(\d{4}-){3}/, '****-****-****-'); // "Card: ****-****-****-9012" // Validate hex color /^#([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})$/.test('#FF00FF'); // true
When stripping HTML tags, note that regex is not a proper HTML parser — for complex HTML manipulation, use DOMParser or a library. For simple sanitization of user-generated text, regex works well. Always prefer dedicated validation libraries for security-critical input validation.
Why it matters: Regex-based HTML sanitization is a well-known source of XSS vulnerabilities — clever attackers craft inputs that bypass naive regex sanitizers. Knowing regex limitations prevents dangerous over-reliance on it for security-sensitive processing.

Real applications: Stripping basic HTML from CMS-generated content before displaying in email, extracting plain text from HTML snippets, building simple markdown-to-HTML converters, and pre-processing log data that may contain HTML characters.

Common mistakes: Using regex to sanitize HTML for XSS prevention (use DOMPurify instead), writing overly-complex regex patterns where a dedicated parser (cheerio, DOMParser) would be safer and more maintainable, and treating "it matches in tests" as proof of correctness for edge-case-heavy HTML content.

Regular Expressions

1How do you create a regular expression in JavaScript?

2What is the difference between test, exec, and match?

3How do you validate an email address with regex?

4What are character classes and quantifiers in regex?

5How do capturing groups and non-capturing groups work?

6What are lookahead and lookbehind assertions?

7How do you use replace with regex?

8What do the different regex flags do?

9What are named capturing groups?

10How do you use regex for string splitting and tokenizing?

11What is the lastIndex property and how does it affect regex?

12How do you escape special characters in regex?

13How do you use regex for input validation patterns?

14What are atomic groups and possessive quantifiers?

15How do you use regex with Unicode text?

16What are common regex patterns used in web development?