Regex Cheatsheet

Quick reference for regular expression syntax and common patterns.

Regex Cheatsheet

A regular expression (regex) is a pattern that describes text. You use it to find, validate, or replace parts of a string. Typical uses include checking that a field looks like an email, pulling out dates or IDs from log lines, and search-and-replace in editors or code. Syntax and features vary a bit by language or “flavor,” but the ideas below work in JavaScript, Python, and most modern tools.

Character classes

. any character (except newline in most engines). \d digit, \D non-digit. \w word character (letter, digit, underscore), \W non-word. \s whitespace, \S non-whitespace. Custom: [a-z] lowercase, [0-9] digits, [^x] anything except x.

Anchors and quantifiers

^ start of string (or line with m flag). $ end of string or line. \b word boundary. * zero or more, + one or more, ? zero or one. {3} exactly 3, {2,} two or more, {2,5} two to five. Add ? after a quantifier to make it lazy (e.g. .*?).

Groups and alternation

(…) capturing group (you get the match in group 1, 2, …). (?:…) non-capturing (no group). | alternation: cat|dog matches “cat” or “dog”.

Flags

g all matches (not just first). i case insensitive. m multiline (^ and $ match line start/end). s dotall (. matches newline). In JavaScript you add them after the closing slash: /pattern/gi.

In code: JavaScript

// Literal: /pattern/flags. Constructor when pattern is dynamic: new RegExp("pattern", "gi")
const re = /\b\w+\b/g;
"Hello world".match(re);        // ["Hello", "world"] with g; without g, first match only
/\d+/.test("abc123");           // true (at least one match)
"a b c".replace(/\s+/g, "-");   // "a-b-c"
"Name: Jane".replace(/Name:\s*(.+)/, "$1");   // "Jane" ($1 = first capture group)

// All matches with capture groups (ES2020+)
const str = "x=1 y=2";
const re = /(\w+)=(\d+)/g;
for (const m of str.matchAll(re)) {
  console.log(m[1], m[2]);      // x 1, then y 2
}

In code: Python

import re
# re.compile optional; use it when you reuse the same pattern
pat = re.compile(r"\b\w+\b")
pat.findall("Hello world")      # ['Hello', 'world']
re.search(r"\d+", "abc123")     # match object or None
re.sub(r"\s+", "-", "a b c")    # 'a-b-c'
re.sub(r"Name:\s*(.+)", r"\1", "Name: Jane")   # 'Jane' (\1 = first group)

# All matches with groups
for m in re.finditer(r"(\w+)=(\d+)", "x=1 y=2"):
    print(m.group(1), m.group(2))   # x 1, then y 2

Examples with real strings

Match a whole word. Pattern \b\w+\b. String: “The quick brown fox”. Matches: “The”, “quick”, “brown”, “fox”. Use with g to get all.

Loose email. Pattern [\w.-]+@[\w.-]+\.\w+. Matches “ user@example.com ” or “ first.last@mail.co.uk ”. Not strict (allows invalid addresses) but fine for “find something that looks like an email”.

Strip HTML tags. Pattern /<[^>]+>/g. Replace with empty string. String: “

Hello

” becomes “Hello”. Warning: breaks on < inside attributes or malformed HTML.

Extract first path segment. Pattern \/([^/]+)\/. String: “/api/users/123”. Group 1: “users”. For the first segment only use ^/([^/]+) so group 1 is “api”.

Optional protocol. Pattern (https?:\/\/)?(.+). String: “ https://example.com ” gives group 2 “example.com”. String “example.com” (no protocol) also matches, group 2 “example.com”. Use group 2 for “URL with or without protocol”.

Lines that start with #. Pattern ^#.* with m flag. In a multi-line string each line starting with # matches (e.g. markdown headings).

Integer or decimal number. Pattern -?\d+(?:\.\d+)?. Matches “42”, “-3”, “1.5”, “0.99”. Non-capturing (?:\.\d+)? for the optional decimal part.

Phone (digits, spaces, plus, minus). Pattern [\d\s+-]{10,}. Matches “555-123-4567”, “+1 555 123 4567”. At least 10 characters. Tighten as needed (e.g. exact length).

Capture everything after a label. Pattern Name:\s*(.+). String “Name: John Smith”. Group 1: “John Smith”. Use .*? instead of .+ if you have multiple such labels and want to stop at the next one.

Split on multiple delimiters. Pattern [\s,;]+. String “a, b; c d” split gives [“a”, “b”, “c”, “d”].

Match a quoted string. Pattern "([^"]*)" captures the text inside double quotes. String say "hello" gives group 1 “hello”. For single quotes use '([^']*)'. Escaped quotes inside the string need a more careful pattern.

Replace examples. In JavaScript: str.replace(/foo/g, "bar") replaces all “foo” with “bar”. Use $1, $2 in the replacement to insert captured groups: "Hello World".replace(/(\w+)\s+(\w+)/, "$2, $1") gives “World, Hello”. In other languages the replacement syntax may use \1 or $1 for group 1. Check the docs.

Date like YYYY-MM-DD. Pattern \d{4}-\d{2}-\d{2}. Matches “2024-01-15”. Does not validate (e.g. month 99). For strict validation you need more logic or a second step.

Username (letters, numbers, underscore). Pattern \b[a-zA-Z][a-zA-Z0-9_]*\b. Matches “user_1”, “Admin”. First character must be a letter. Adjust length with {3,20} if needed: \b[a-zA-Z][a-zA-Z0-9_]{2,19}\b.

Escaping. Characters that are special in regex (e.g. .^$*+?()[{\|) must be escaped with a backslash if you want them literally. Inside a character class [...], escape ^-]\ and sometimes others depending on position. The exact list varies by regex flavor (JavaScript, Python, PCRE, etc.). When the pattern is in a string, the language may also use backslash for escapes, so you sometimes need two: "\\d+" in JS or Python for the pattern \d+.

Watch out. Quantifiers are greedy by default. .* eats as much as possible. For “first match then stop” use the lazy form: .*?. Example: <div>.*?</div> matches the first div and its contents; <div>.*</div> can match from the first <div> to the last </div> in the whole string. In regex literals, backslash is special. In a string you often need to double it: "\\d+" for a digit pattern. In JavaScript regex literal you write /\d+/.