2024-04-02 21:19:20 +02:00
---
tags:
- bash
- shell
- scripting
- glob
- globbing
- wildcards
- filename
- pattern
- matching
---
2023-07-05 11:43:35 +02:00
2024-04-02 21:19:20 +02:00
# Patterns and pattern matching
2023-07-05 11:43:35 +02:00
A pattern is a **string description** . Bash uses them in various ways:
2024-01-29 02:01:50 +01:00
- [Pathname expansion ](../syntax/expansion/globs.md ) (Globbing - matching
2023-07-05 11:43:35 +02:00
filenames)
- Pattern matching in [conditional
2024-01-29 02:01:50 +01:00
expressions](../syntax/ccmd/conditional_expression.md)
- [Substring removal ](../syntax/pe.md#substring_removal ) and [search and
replace](../syntax/pe.md#search_and_replace) in [Parameter
Expansion](../syntax/pe.md)
- Pattern-based branching using the [case command ](../syntax/ccmd/case.md )
2023-07-05 11:43:35 +02:00
The pattern description language is relatively easy. Any character
2024-03-30 19:22:45 +01:00
that's not mentioned below matches itself. The `NUL` character may not
2023-07-05 11:43:35 +02:00
occur in a pattern. If special characters are quoted, they\'re matched
literally, i.e., without their special meaning.
Do **not** confuse patterns with ** *regular expressions***, because they
share some symbols and do similar matching work.
## Normal pattern language
Sequence Description
---------- ----------------------------------------------------------------------------------------------------------------
`*` Matches **any string** , including the null string (empty string)
`?` Matches any **single character**
`X` Matches the character `X` which can be any character that has no special meaning
2024-03-30 19:22:45 +01:00
`\X` Matches the character `X` , where the character's special meaning is stripped by the backslash
2023-07-05 11:43:35 +02:00
`\\` Matches a backslash
`[...]` Defines a pattern **bracket expression** (see below). Matches any of the enclosed characters at this position.
### Bracket expressions
The bracket expression `[...]` mentioned above has some useful
applications:
Bracket expression Description
---------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
`[XYZ]` The \"normal\" bracket expression, matching either `X` , `Y` or `Z`
`[X-Z]` A range expression: Matching all the characters from `X` to `Y` (your current **locale** , defines how the characters are **sorted** !)
`[[:class:]]` Matches all the characters defined by a [POSIX(r) character class ](https://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap07.html#tag_07_03_01 ): `alnum` , `alpha` , `ascii` , `blank` , `cntrl` , `digit` , `graph` , `lower` , `print` , `punct` , `space` , `upper` , `word` and `xdigit`
`[^...]` A negating expression: It matches all the characters that are **not** in the bracket expression
`[!...]` Equivalent to `[^...]`
`[]...]` or `[-...]` Used to include the characters `]` and `-` into the set, they need to be the first characters after the opening bracket
`[=C=]` Matches any character that is eqivalent to the collation weight of `C` (current locale!)
`[[.SYMBOL.]]` Matches the collating symbol `SYMBOL`
### Examples
Some simple examples using normal pattern matching:
- Pattern `"Hello world"` matches
- `Hello world`
- Pattern `[Hh]"ello world"` matches
2024-04-01 06:10:32 +02:00
- => `Hello world`
- => `hello world`
2023-07-05 11:43:35 +02:00
- Pattern `Hello*` matches (for example)
2024-04-01 06:10:32 +02:00
- => `Hello world`
- => `Helloworld`
- => `HelloWoRlD`
- => `Hello`
2023-07-05 11:43:35 +02:00
- Pattern `Hello world[[:punct:]]` matches (for example)
2024-04-01 06:10:32 +02:00
- => `Hello world!`
- => `Hello world.`
- => `Hello world+`
- => `Hello world?`
2023-07-05 11:43:35 +02:00
- Pattern
`[[.backslash.]]Hello[[.vertical-line.]]world[[.exclamation-mark.]]`
matches (using [collation
symbols](https://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap07.html#tag_07_03_02_04))
2024-04-01 06:10:32 +02:00
- => `\Hello|world!`
2023-07-05 11:43:35 +02:00
## Extended pattern language
2024-01-29 02:01:50 +01:00
If you set the [shell option ](../internals/shell_options.md ) `extglob` , Bash
2023-07-05 11:43:35 +02:00
understands some powerful patterns. A `<PATTERN-LIST>` is one or more
patterns, separated by the pipe-symbol (`PATTERN|PATTERN`).
--------------------- ------------------------------------------------------------
`?(<PATTERN-LIST>)` Matches **zero or one** occurrence of the given patterns
`*(<PATTERN-LIST>)` Matches **zero or more** occurrences of the given patterns
`+(<PATTERN-LIST>)` Matches **one or more** occurrences of the given patterns
`@(<PATTERN-LIST>)` Matches **one** of the given patterns
`!(<PATTERN-LIST>)` Matches anything **except** one of the given patterns
--------------------- ------------------------------------------------------------
### Examples
**[Delete all but one specific file]{.underline}**
rm -f !(survivior.txt)
## Pattern matching configuration
### Related shell options
option classification description
------------------- ------------------------------------- -------------------------------------------------------------------------------
2024-01-29 02:01:50 +01:00
`dotglob` [globbing ](../syntax/expansion/globs.md ) see [Pathname expansion customization ](../syntax/expansion/globs.md#Customization )
2023-07-05 11:43:35 +02:00
`extglob` global enable/disable extended pattern matching language, as described above
2024-01-29 02:01:50 +01:00
`failglob` [globbing ](../syntax/expansion/globs.md ) see [Pathname expansion customization ](../syntax/expansion/globs.md#Customization )
`nocaseglob` [globbing ](../syntax/expansion/globs.md ) see [Pathname expansion customization ](../syntax/expansion/globs.md#Customization )
2023-07-05 11:43:35 +02:00
`nocasematch` pattern/string matching perform pattern matching without regarding the case of individual letters
2024-01-29 02:01:50 +01:00
`nullglob` [globbing ](../syntax/expansion/globs.md ) see [Pathname expansion customization ](../syntax/expansion/globs.md#Customization )
`globasciiranges` [globbing ](../syntax/expansion/globs.md ) see [Pathname expansion customization ](../syntax/expansion/globs.md#Customization )
2023-07-05 11:43:35 +02:00
## Bugs and Portability considerations
\* Counter-intuitively, only the `[!chars]` syntax for negating a
character class is specified by POSIX for shell pattern matching.
`[^chars]` is merely a commonly-supported extension. Even dash supports
`[^chars]` , but not posh.
\* All of the extglob quantifiers supported by bash were supported by
ksh88. The set of extglob quantifiers supported by ksh88 are identical
to those supported by Bash, mksh, ksh93, and zsh.
\* mksh does not support POSIX character classes. Therefore, character
ranges like `[0-9]` are somewhat more portable than an equivalent POSIX
class like `[:digit:]` .
\* Bash uses a custom runtime interpreter for pattern matching. (at
least) ksh93 and zsh translate patterns into regexes and then use a
regex compiler to emit and cache optimized pattern matching code. This
means Bash may be an order of magnitude or more slower in cases that
involve complex back-tracking (usually that means extglob quantifier
2024-03-30 19:22:45 +01:00
nesting). You may wish to use Bash's regex support (the `=~` operator)
2023-07-05 11:43:35 +02:00
if performance is a problem, because Bash will use your C library regex
implementation rather than its own pattern matcher.
TODO: describe the pattern escape bug
< https: / / gist . github . com / ormaaj / 6195070 >
### ksh93 extras
ksh93 supports some very powerful pattern matching features in addition
to those described above.
\* ksh93 supports arbitrary quantifiers just like ERE using the
`{from,to}(pattern-list)` syntax. `{2,4}(foo)bar` matches between 2-4
2024-03-30 19:22:45 +01:00
\"foo\"'s followed by \"bar\". `{2,}(foo)bar` matches 2 or more
\"foo\"'s followed by \"bar\". You can probably figure out the rest. So
2023-07-05 11:43:35 +02:00
far, none of the other shells support this syntax.
\* In ksh93, a `pattern-list` may be delimited by either `&` or `|` . `&`
means \"all patterns must be matched\" instead of \"any pattern\". For
example, `[[ fo0bar == @(fo[0-9]&+([[:alnum:]]))bar ]]` would be true
while `[[ f00bar == @(fo[0-9]&+([[:alnum:]]))bar ]]` is false, because
all members of the and-list must be satisfied. No other shell supports
this so far, but you can simulate some cases in other shells using
double extglob negation. The aforementioned ksh93 pattern is equivalent
in Bash to: `[[ fo0bar == !(!(fo[0-9])|!(+([[:alnum:]])))bar ]]` , which
is technically more portable, but ugly.
2024-03-30 19:22:45 +01:00
\* ksh93's [printf ](../commands/builtin/printf.md ) builtin can translate from
2023-07-05 11:43:35 +02:00
shell patterns to ERE and back again using the `%R` and `%P` format
specifiers respectively.
TODO: `~()` (and regex), `.sh.match` , backrefs, special `${var/.../...}`
behavior, `%()`