Pattern Matching — LIKE & Wildcards
Find rows that match a pattern rather than an exact value — the % and _ wildcards, ILIKE, SIMILAR TO, performance implications, and every real-world use case
// Part 01
When Exact Match Is Not Enough
The WHERE clause with = finds rows that match an exact value. Perfect when you know exactly what you are looking for — a specific customer_id, a specific order_status, a specific city name. But real business questions are often fuzzier:
None of these can be answered with =. You do not know the full exact value — you only know a pattern. SQL's answer to this is the LIKE operator, combined with two wildcard characters that stand in for unknown parts of a string.
// Part 02
The Two Wildcards — % and _
LIKE uses two special characters called wildcards. Every other character in the pattern is matched literally — only these two have special meaning.
// Part 03
The % Wildcard — Matching Any Sequence
The percent sign % is by far the most-used wildcard. It matches any sequence of zero or more characters. Its position in the pattern controls what it matches.
Starts with — pattern%
Ends with — %pattern
Contains — %pattern%
% at both ends — anywhere in the string
// Part 04
The _ Wildcard — Matching Exactly One Character
The underscore _ matches exactly one character — any character. Use it when you know the length of a field or when you need to match a specific position while allowing variation at another.
Store ID pattern — fixed prefix, variable digit
Fixed-length string matching
Combining % and _ in the same pattern
// Part 05
NOT LIKE — Excluding Pattern Matches
Just as NOT IN excludes a list of values, NOT LIKE excludes rows matching a pattern. All rows where the pattern does not match are returned.
// Part 06
Case Sensitivity — LIKE vs ILIKE
Case sensitivity in LIKE varies by database — this is one of the most common portability issues when switching between MySQL and PostgreSQL.
| Database | LIKE behaviour | Case-insensitive option |
|---|---|---|
| MySQL | Case-insensitive by default (uses collation) | LIKE works — 'amul%' matches 'Amul Butter' |
| PostgreSQL | Case-sensitive - 'amul%' does NOT match 'Amul Butter' | Use ILIKE for case-insensitive: WHERE name ILIKE 'amul%' |
| DuckDB (playground) | Case-sensitive like PostgreSQL | Use ILIKE or LOWER(): WHERE LOWER(name) LIKE 'amul%' |
| SQLite | Case-insensitive for ASCII characters only | Works for a-z/A-Z but not for non-ASCII (ñ, é, etc.) |
| SQL Server | Depends on collation (usually case-insensitive) | Use COLLATE for explicit control |
ILIKE — case-insensitive LIKE in PostgreSQL
Cross-database case-insensitive pattern — LOWER()
For code that must run on both MySQL and PostgreSQL, wrap the column in LOWER() and use a lowercase pattern. This works everywhere but prevents index usage on the column (same trade-off as all function-on-column conditions).
// Part 07
Escaping Wildcards — When % and _ Are Literal
Sometimes you genuinely want to search for a percent sign or an underscore as a literal character — not as a wildcard. For example, finding products with "100%" in their name, or store codes that contain an underscore. You need to escape the wildcard to tell the database to treat it as a literal character.
The ESCAPE clause
// Part 08
LIKE Performance — The Leading Wildcard Problem
LIKE performance depends heavily on where the wildcard appears in the pattern. This is one of the most important performance considerations in SQL — and one of the most frequently ignored.
The three performance cases
Solutions for contains and ends-with searches at scale
If your application frequently searches for text that appears anywhere in a string — product search, customer name search, address search — LIKE '%pattern%' on a large table will not scale. Three production alternatives:
🎯 Pro Tip
In job interviews and system design discussions, always mention the leading wildcard problem when discussing LIKE-based search. Saying "LIKE '%term%' does not use an index and will be slow at scale — for a production search feature we would use PostgreSQL full-text search or Elasticsearch" signals senior-level thinking. Most candidates do not know this distinction.
// Part 09
Multiple LIKE Conditions — Patterns in Combination
LIKE conditions combine with AND, OR, and NOT exactly like any other WHERE condition. This lets you build complex pattern-based filters.
// Part 10
SIMILAR TO — Advanced Pattern Matching
PostgreSQL also supports SIMILAR TO, which is a hybrid between LIKE and regular expressions. It uses a limited regex-like syntax — more powerful than LIKE but less powerful than full regular expressions. It is less commonly used in practice but useful to know.
Regular expressions — when you need real power
For full regular expression matching, MySQL uses REGEXP (or RLIKE), and PostgreSQL uses the ~ operator. These are the most powerful text matching tools in SQL but also the slowest — they always require full table scans.
// Part 11
What This Looks Like at Work
You are a data analyst at Sephora. The customer service team receives a complaint from a customer who cannot remember their account email but knows it starts with their name "meera" and ends with either @gmail.com or @yahoo.com. The compliance team also needs to audit all product names that do not follow the naming convention — they should start with the brand name.
🎯 Pro Tip
LIKE with a dynamic pattern built from another column value — WHERE product_name LIKE brand || '%' — is a powerful technique for data quality audits. The || concatenates the brand value with % to create a starts-with pattern for each row. This works because LIKE evaluates the pattern per row, not once for the whole query. Use this for format validation, naming convention checks, and cross-column consistency audits.
// Part 12
Interview Prep — 5 Questions With Complete Answers
LIKE is a SQL comparison operator used in WHERE clauses to match string values against a pattern rather than an exact value. Instead of finding rows where a column equals a specific string, LIKE finds rows where the column matches a pattern that can include wildcard characters standing in for unknown characters.
LIKE has two wildcard characters. The percent sign % matches any sequence of zero or more characters — it is the most common wildcard and stands in for "anything could be here." LIKE 'Amul%' matches 'Amul Butter', 'Amul Milk', 'Amul Fresh Paneer', and any string starting with 'Amul'. LIKE '%gmail.com' matches any string ending with 'gmail.com'. LIKE '%Fresh%' matches any string containing 'Fresh' anywhere.
The underscore _ matches exactly one character — any character. LIKE 'ST00_' matches 'ST001' through 'ST009' but not 'ST010' (which has two digits after 'ST00'). LIKE '____' matches any string of exactly four characters. LIKE '_ata' matches 'Tata', 'Data', 'Bata' — any four-character string ending in 'ata'. The two wildcards can be combined: LIKE '_a%' matches any string where the second character is 'a', regardless of what comes before or after.
LIKE performs case-sensitive pattern matching in PostgreSQL, DuckDB, and SQLite — the pattern must match the case of the stored value exactly. LIKE 'amul%' does not match 'Amul Butter' in PostgreSQL because 'a' does not equal 'A'. LIKE 'Amul%' matches 'Amul Butter' correctly.
ILIKE is a PostgreSQL-specific operator that performs case-insensitive pattern matching. ILIKE 'amul%' matches 'Amul Butter', 'AMUL MILK', 'amul ghee', and any other variation of case. ILIKE 'AMUL%' also matches all the same values — the pattern and the data are both treated as if they were lowercase before comparison.
MySQL's LIKE is case-insensitive by default (controlled by the collation), so MySQL LIKE behaves like PostgreSQL ILIKE for standard ASCII characters. SQL Server's case sensitivity also depends on the database collation. For cross-database compatible case-insensitive pattern matching, use LOWER() on both the column and pattern: WHERE LOWER(column) LIKE LOWER('pattern%'). This works identically on all databases but prevents index usage on the column. The recommendation: use ILIKE in PostgreSQL for case-insensitive searches, and use LOWER() only when writing SQL that must run on multiple database systems.
LIKE '%pattern%' (with a leading %) is slow because the leading percent wildcard prevents the database from using a B-tree index on the column. A B-tree index organises values in sorted order by their prefix — it can efficiently find all values starting with 'Amul' because those values are contiguous in the index. But for '%Fresh%', the database does not know where in the sorted index the matching values are — they could be anywhere. The database must read every row, retrieve the full value, and check whether it contains 'Fresh'. This is a full table scan: O(n) where n is the number of rows. On a table with 10 million products, this takes seconds.
The production alternatives depend on the scale and the database. For moderate scale (up to a few million rows), PostgreSQL's built-in full-text search using tsvector columns and GIN indexes can make contains-searches fast by pre-computing searchable tokens from text columns. A GIN index on tsvector supports fast text search without reading every row. MySQL supports FULLTEXT indexes with similar capabilities.
For large scale — millions of products, user-facing search with sub-100ms latency requirements — dedicated search engines like Elasticsearch, OpenSearch, or Typesense are the production standard. These systems are built specifically for full-text and fuzzy matching and are used by Amazon, Sephora, Amazon India, and every major e-commerce platform for their product search. They run alongside the SQL database: SQL handles transactions and structured queries, the search engine handles text search. LIKE '%pattern%' is acceptable for small tables and ad-hoc queries, but should never be the foundation of a production search feature on large datasets.
Since % and _ are wildcard characters with special meaning in LIKE patterns, you must escape them when you want to match them as literal characters. SQL provides the ESCAPE clause for this purpose: LIKE 'pattern' ESCAPE 'escape_char'. The escape character is a single character you define — commonly backslash \ or exclamation mark !. Any wildcard that follows the escape character is treated as a literal character instead of a wildcard.
Example: to find product descriptions containing "100%", write WHERE description LIKE '%100\%%' ESCAPE '\'. The first % is a wildcard (match anything before "100"), the \% is a literal percent sign, and the final % is another wildcard (match anything after). Similarly, to find values containing a literal underscore: WHERE code LIKE '%\_order%' ESCAPE '\' — the \_ is a literal underscore.
In PostgreSQL, the backslash already has special meaning in string literals, so you may need to double it: LIKE '%100\\%%' ESCAPE '\\'. Using a non-standard escape character like ! avoids this: WHERE description LIKE '%100!%%' ESCAPE '!'. This is cleaner and avoids backslash confusion. In practice, the need to search for literal % or _ is rare in well-designed databases — values that users search for usually do not contain SQL wildcard characters. When they do appear (in financial data with percentage signs, for example), the ESCAPE clause is the correct solution rather than any workaround.
LIKE is the standard SQL pattern matching operator supported by all relational databases. It uses only two wildcards (% and _), making it simple and fast for straightforward patterns. LIKE with a leading literal (not a wildcard) can use B-tree indexes and is performant at scale. LIKE is the right choice for the vast majority of pattern matching needs.
SIMILAR TO is a PostgreSQL-only operator that extends LIKE with a limited regex-like syntax: | for alternation, [] for character classes, * for zero or more repetitions, + for one or more, ? for zero or one. SIMILAR TO '%@(gmail|yahoo)\.com' matches email addresses from either domain. SIMILAR TO is anchored — the pattern must match the entire string. It is more powerful than LIKE but less commonly used because ILIKE handles most needs, and full regular expressions are more expressive when SIMILAR TO is not enough. SIMILAR TO is always a full scan — it offers no index benefit.
Regular expressions (REGEXP in MySQL, ~ in PostgreSQL) are the most powerful text matching tool — they support the full regular expression syntax: lookaheads, backreferences, named groups, character classes, quantifiers. They handle complex validation patterns (PAN card format, SSN format, phone number patterns), multi-pattern alternation, and any matching logic LIKE cannot express. Regular expressions are always slow — full scans, no index benefit — and have complex, easy-to-get-wrong syntax. Use them when LIKE cannot solve the problem and the table is small enough that the scan cost is acceptable. For production search on large tables, none of the three are the right choice — use full-text search indexes or a dedicated search engine instead.
// Part 13
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓LIKE is a WHERE operator that matches strings against a pattern rather than an exact value. It uses two wildcards: % (any sequence of zero or more characters) and _ (exactly one character).
- ✓Pattern position controls what is matched: prefix% = starts with, %suffix = ends with, %contains% = contains anywhere, prefix%suffix = starts and ends with specific values.
- ✓LIKE is case-sensitive in PostgreSQL and DuckDB. Use ILIKE (PostgreSQL/DuckDB) for case-insensitive matching. Use LOWER(column) LIKE LOWER(pattern) for cross-database compatibility.
- ✓NOT LIKE excludes matching rows. Like all NOT conditions, it silently excludes NULL rows — add OR column IS NULL if you need NULL rows in the result.
- ✓Starts-with patterns (LIKE 'prefix%') can use B-tree indexes — fast on large tables. Leading wildcard patterns (LIKE '%pattern') always require full table scans — slow on large tables.
- ✓To search for a literal % or _ character, use the ESCAPE clause: WHERE value LIKE '100\%%' ESCAPE '\'.
- ✓Multiple LIKE conditions combine with AND and OR like any other WHERE condition. Parentheses are essential when mixing AND and OR with LIKE.
- ✓LIKE with a dynamic pattern built from another column (WHERE name LIKE brand || '%') performs a starts-with check using the column value as the prefix — useful for data quality audits.
- ✓SIMILAR TO (PostgreSQL) adds regex-like operators to LIKE. REGEXP (MySQL) and ~ (PostgreSQL) support full regular expressions — most powerful but always slow, no index benefit.
- ✓For production full-text search on large tables, use PostgreSQL full-text search (tsvector/GIN indexes) or a dedicated search engine (Elasticsearch, Typesense). LIKE '%pattern%' does not scale beyond a few million rows for user-facing features.
What comes next
In Module 15, you learn the IN and BETWEEN operators — clean shorthand for multiple OR conditions and range checks that make complex WHERE clauses dramatically more readable.
Module 15 → IN and BETWEEN OperatorsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.