<navid="sidebar"class="sidebar"aria-label="Table of contents">
<divclass="sidebar-scrollbox">
<olclass="chapter"><liclass="chapter-item expanded affix "><ahref="zshguide.html">A User's Guide to the Z-Shell</a></li><liclass="chapter-item expanded "><ahref="zshguide01.html"><strongaria-hidden="true">1.</strong> A short introduction</a></li><liclass="chapter-item expanded "><ahref="zshguide02.html"><strongaria-hidden="true">2.</strong> What to put in your startup files</a></li><liclass="chapter-item expanded "><ahref="zshguide03.html"><strongaria-hidden="true">3.</strong> Dealing with basic shell syntax</a></li><liclass="chapter-item expanded "><ahref="zshguide04.html"><strongaria-hidden="true">4.</strong> The Z-Shell Line Editor</a></li><liclass="chapter-item expanded "><ahref="zshguide05.html"class="active"><strongaria-hidden="true">5.</strong> Substitutions</a></li><liclass="chapter-item expanded "><ahref="zshguide06.html"><strongaria-hidden="true">6.</strong> Completion, old and new</a></li><liclass="chapter-item expanded "><ahref="zshguide07.html"><strongaria-hidden="true">7.</strong> Modules and other bits and pieces Not written</a></li></ol>
<buttonid="sidebar-toggle"class="icon-button"type="button"title="Toggle Table of Contents"aria-label="Toggle Table of Contents"aria-controls="sidebar">
<ahref="print.html"title="Print this book"aria-label="Print this book">
<iid="print-button"class="fa fa-print"></i>
</a>
</div>
</div>
<divid="search-wrapper"class="hidden">
<formid="searchbar-outer"class="searchbar-outer">
<inputtype="search"id="searchbar"name="searchbar"placeholder="Search this book ..."aria-controls="searchresults-outer"aria-describedby="searchresults-header">
<li><ahref="#541-using-arrays">5.4.1: Using arrays</a></li>
<li><ahref="#542-using-associative-arrays">5.4.2: Using associative arrays</a></li>
<li><ahref="#543-substituted-substitutions-top--and-tailing-etc">5.4.3: Substituted substitutions, top- and tailing, etc.</a></li>
<li><ahref="#544-flags-for-options-splitting-and-joining">5.4.4: Flags for options: splitting and joining</a></li>
<li><ahref="#545-flags-for-options-glob_subst-and-rc_expand_param">5.4.5: Flags for options: <code>GLOB_SUBST</code> and <code>RC_EXPAND_PARAM</code></a></li>
<li><ahref="#546-yet-more-parameter-flags">5.4.6: Yet more parameter flags</a></li>
<li><ahref="#547-a-couple-of-parameter-substitution-tricks">5.4.7: A couple of parameter substitution tricks</a></li>
<p>Now for some enhancements that zsh has for using the forms of parameter
substitution I've just given as well as some similar but different ones.</p>
<p>One simple enhancement is that in addition to
`<code>${</code><em>param</em><code>=</code><em>value</em><code>}</code>' and `<code>${</code><em>param</em><code>:=</code><em>value</em><code>}</code>', zsh has
`<code>${</code><em>param</em><code>::=</code><em>value</em><code>}</code>' which performs an unconditional assignment
as well as sticking the value on the command line. It's not really any
different from using a normal assignment, then a normal parameter
substitution, except that zsh users like densely packed code.</p>
<p>All the assignment types are affected by the parameter flags `<code>A</code>' and
`<code>AA</code>' which tell the shell to perform array and associative array
assignment (in the second case, you need pairs of key/value elements as
usual). You need to be a little bit careful with array elements and word
splitting, however:</p>
<pre><code> % print -l ${(A)foo::=one two three four}
one two three four
% print ${#foo}
1
</code></pre>
<p>That made <code>$foo</code> an array all right, but treated the argument as a
scalar value and assigned it to the first element. There's a way round
this:</p>
<pre><code> % print -l ${(A)=foo::=one two three four}
one
two
three
four
% print ${#foo}
4
</code></pre>
<p>Here, the `<code>=</code>' <em>before</em> the parameter name has a completely different
effect from the others: it turns on word-splitting, just as if the
option <code>SH_WORD_SPLIT</code> is in effect. You may remember I went into this
in appalling detail in the section `Function parameters' in <ahref="zshguide03.html#syntax">chapter
3</a>.</p>
<p>You should be careful, however, as more sophisticated attempts at
putting arrays inside parameter values can easily lead you astray. It's
usually much easier to use the `<em>array</em><code>=</code>(<em>...</em>)' or `<code>set -A</code><em>...</em>'
notations.</p>
<p>One extremely useful zsh enhancement is the notation `<code>${+foo}</code>' which
returns 1 if <code>$foo</code> is set and 0 if it isn't. You can use this in
arithmetic expressions. This is a much more flexible way of dealing with
possibly unset parameters than the more standard `<code>${foo?goodbye}</code>'
notation, and consequently is better used by zsh programmers. The
notation `plus foo' for `foo is set' should be fairly memorable, too.
A more standard way of doing this (noted by David Korn) is
`<code>0${foo+1}</code>', giving 0 if <code>$foo</code> is not set and 01 if it is.</p>
<p><strong>Parameter flags and pattern substitutions</strong></p>
<p>Zsh increases the usefulness of the `top and tail' operators with some
of its parameter flags. Usually these show you what's left after the
removal of some matched portion. However, with the flag <code>(M)</code> the shell
will instead show you the matched portion itself. The flag <code>(R)</code> is the
opposite and shows the rest: that's not all that useful in the normal
case, since you get that by default. It only starts being useful when
you combine it with other flags.</p>
<p>Next, zsh allows you to match on substrings, not just on the head or
tail. You can do this by giving the flag <code>(S)</code> with either of the `<code>#</code>'
or `<code>%</code>' pattern-matching forms. The difference here is whether the
shell starts searching for a matching substring at the start or end of
the full string. Let's take</p>
<pre><code> foo='where I was huge lizards walked here and there'
</code></pre>
<p>and see what we get matching on `<code>h*e</code>':</p>
<p>There are some odd discrepancies at first sight, but here's what
happens. In the first case, `<code>#</code>' the shell looks forward until it
finds a match for `<code>h*e</code>', and takes the shortest, which is the `<code>he</code>'
in the first word. With `<code>##</code>', the match succeeds at the same point,
but the longest match extends to the `<code>e</code>' right at the end of the
string. With the other two forms, the shell starts scanning backwards
from the end, and stops as soon as it reaches a starting point which has
a match. For both `<code>%</code>' and `<code>%%</code>' this is the last `<code>h</code>', but the
former matches `<code>he</code>' and the latter matches `<code>here</code>'.</p>
<p>You can extend this by using the <code>(I)</code> flag to specify a numeric index.
The index needs to be delimited, conventionally, although not
necessarily, by colons. The shell will then scan forward or backward,
depending on the form used, until it has found the <code>(I)</code>'th match. Note
that it only ever counts a single match from each position, either the
longest or the shortest, so the <code>(I)</code>'th match starts from the <code>(I)</code>'th
position which has any match. Here's what happens when we remove all the
matches for `<code>#</code>' using the example above.</p>
<pre><code> % for (( i = 1; i <= 5; i++ )); do
for> print ${(SI:$i:)foo#h*e}
for> done
wre I was huge lizards walked here and there
where I was lizards walked here and there
where I was huge lizards walked re and there
where I was huge lizards walked here and tre
where I was huge lizards walked here and there
</code></pre>
<p>Each time we match and remove one of the possible `<code>h*e</code>' sets where
there is no `<code>e</code>' in the middle, moving from left to right. The last
time there was nothing left to match and the complete string was
returned. Note that the index we used was itself a parameter.</p>
<p>It's obvious what happens with `<code>##</code>': it will find matches at all the
same points, but they will all extend to the `<code>e</code>' at the end of the
string. It's probably less obvious what happens with `<code>%%</code>' and `<code>%</code>',
but if you try it you will find they produce just the same set of
matches as `<code>##</code>' and `<code>#</code>', respectively, but with the indices in the
reverse order (4 for 1, 3 for 2, etc.).</p>
<p>You can use the `<code>M</code>' flag to leave the matched portion rather than the
rest of the string, if you like. There are three other flags which let
you get the indices associated with the match instead of the string:
<code>(B)</code> for the beginning, using the usual zsh convention where the first
character is 1, <code>(E)</code> for the character <em>after</em> the end, and <code>(N)</code> for
the length, simply <code>B-E</code>. You can even have more than one of these; the
value substituted is a string with the given values with spaces between,
always in the order beginning, end, length.</p>
<p>There is a sort of opposite to the `<code>(S)</code>' flag, which instead of
matching substrings will only match the whole string; to do this, put a
colon before the `<code>#</code>'. Hence:</p>
<pre><code> % print ${foo:#w*g}
where I was huge lizards walked here and there
% print ${foo:#w*e}
%
</code></pre>
<p>The first one didn't match, because the `<code>g</code>' is not at the end; the
second one did, because there is an `<code>e</code>' at the end.</p>
<p><strong>Pattern replacement</strong></p>
<p>The most powerful of the parameter pattern-matching forms has been
borrowed from bash and ksh93; it doesn't occur in traditional Bourne
shells. Here, you use a pair of `<code>/</code>'s to indicate a pattern to be
replaced, and its replacement. Lets use the lizards again:</p>
<pre><code> % print ${foo/h*e/urgh}
wurgh
</code></pre>
<p>A bit incomprehensible: that's because like most pattern matchers it
takes the longest match unless told otherwise. In this case the <code>(S)</code>
flag has been pressed into service to mean not a substring (that's
automatic) but the shortest match:</p>
<pre><code> % print ${(S)foo/h*e/urgh}
wurghre I was huge lizards walked here and there
</code></pre>
<p>That only replace the first match. This is where `<code>//</code>' comes in; it
replaces every match:</p>
<pre><code> % print ${(S)foo//h*e/urgh}
wurghre I was urgh lizards walked urghre and turghre
</code></pre>
<p>(No doubt you're starting to feel like a typical anachronistic Hollywood
cave-dweller already.) Note the syntax: it's a little bit like
substitution in <code>sed</code> or perl, but there's no slash at the end, and with
`<code>//</code>' only the first slash is doubled. It's a bit confusing that with
the other pattern expressions the single and double forms mean the
shortest and longest match, while here it's the flag <code>(S)</code> that makes
the difference.</p>
<p>The index flag <code>(I)</code> is useful here, too. In the case of `<code>/</code>', it
tells the shell which single match to substitute, and in the case of
`<code>//</code>' it tells the shell at which match to start: all matches starting
from that are replaced.</p>
<p>Overlapping matches are never replaced by `<code>//</code>'; once it has put the
new text in for a match, that section is not considered further and the
text just to its right is examined for matches. This is probably
familiar from other substitution schemes.</p>
<p>You may well be thinking `wouldn't it be good to be able to use the
matched text, or some part of it, in the replacment text?' This is what
you can do in sed with `<code>\1</code>' or `<code>\&</code>' and in perl with `<code>$1</code>' and
`<code>$&</code>'. It turns out this <em>is</em> possible with zsh, due to part of the
more sophisticated pattern matching features. I'll talk about this when
we come on to patterns, since it's not really part of parameter
substitution, although it's designed to work well with that.</p>
<p><spanid="l124"></span></p>
<h3id="544-flags-for-options-splitting-and-joining"><aclass="header"href="#544-flags-for-options-splitting-and-joining">5.4.4: Flags for options: splitting and joining</a></h3>
<p>There are three types of flag that don't look like flags, for historical
reasons; you've already seen them in <ahref="zshguide03.html#syntax">chapter
3</a>. The first is the one that turns on the
<code>SH_WORD_SPLIT</code> option, <code>${=foo}</code>. Note that you can mix this with flags
that <em>do</em> look like flags, in parentheses, in which case the `<code>=</code>' must
come after the closing parenthesis. You can force the option to be
turned off for a single substitution by doubling the symbol:
`<code>${==foo}</code>'. However, you wouldn't do that unless the option was
already set, in which case you are probably trying to be compatible with
some other shell, and wouldn't want to use that form.</p>
<p>More control over splitting and joining is possible with three of the
more standard type of flags, <code>(s)</code>, <code>(j)</code> and <code>(z)</code>. These do splitting
on a given string, joining with a given string, and splitting just the
way the shell does it, respectively. In the first two cases, you need to
specify the string in the same way as you specified the index for the
<code>(I)</code> flag. So, for example, here's how to turn <code>$PATH</code> into an ordinary
array without using <code>$path</code>:</p>
<pre><code> % print -l ${(s.:.)PATH}
/home/pws/bin
/usr/local/bin
/usr/sbin
/sbin
/bin
/usr/bin
/usr/X11R6/bin
/usr/games
</code></pre>
<p>Any character can follow the <code>(s)</code> or <code>(j)</code>; the string argument lasts
until the matching character, here `<code>.</code>'. If the character is one of
the bracket-like characters including `<code><</code>', the `matching' character
is the corresponding right bracket, e.g. `<code>${(s<:>)PATH}</code>' and
`<code>${(s(:))PATH}</code>' are both valid. This applies to all flags that need
arguments, including <code>(I)</code>.</p>
<p>Although the split or join string isn't a pattern, it doesn't have to be
a single character:</p>
<pre><code> % foo=(array of words)
% print ${(j.**.)foo}
array**of**words
</code></pre>
<p>The <code>(z)</code> flag doesn't take an argument. As it handles splitting on the
full shell definition of a word, it goes naturally with quoted
expressions, and I discussed above its use with the <code>(Q)</code> flag for
extracting words from a line with the quotes removed.</p>
<p>It's possible for the same parameter expression to have both splitting
and joining applied to it. This always occurs in the same order,
regardless of how you specify the flags: joining first, then splitting.
This is described in the (rather hairy) complete set of rules in the
manual entry for parameter substitution. There are one or two occasions
where this can be a bit surprising. One is when you have <code>SH_WORD_SPLIT</code>
set and try to join a string:</p>
<pre><code> % setopt shwordsplit
% foo=('another array' of 'words with spaces')
% print -l ${(j.:.)foo}
another
array:of:words
with
spaces
</code></pre>
<p>You might not have noticed if you didn't use the `<code>-l</code> option to print,
but the spaces still caused word-spliting even though you asked for the
array to be joined with colons. To avoid this, either don't use
<code>SH_WORD_SPLIT</code> (my personal preference), or use quotes:</p>
<pre><code> % print -l "${(j.:.)foo}"
another array:of:words with spaces
</code></pre>
<p>The elements of an array would normally be joined by spaces in this
case, but the character specified by the <code>(j)</code> flag takes precedence. In
just the same way, if <code>SH_WORD_SPLIT</code> is in effect, any splitting string
given by <code>(s)</code> is used instead of the normal set of characters, which
are any characters that occur in the string <code>$IFS</code>, by default space,
tab, newline and NUL.</p>
<p>Specifying a split for a particular parameter substitution not only sets
the string to split on, but also ensures the split will take place even
<p>To be clear about what's happening here: the quotes force the elements
to be joined with spaces, giving a single string, which is then split on
the original spaces as well as the one used to join the elements of the
array.</p>
<p>I will talk shortly about nested parameter substitution; you should also
note that splitting and joining will if necessary take place at all
levels of a nested substitution, not just the outermost one:</p>
<pre><code> % foo="three blind words"
% print ${#${(z)foo}}
3
</code></pre>
<p>This prints the length of the innermost expression; because of the
zplit, that has produced a three-element array.</p>
<p><spanid="l125"></span></p>
<h3id="545-flags-for-options-glob_subst-and-rc_expand_param"><aclass="header"href="#545-flags-for-options-glob_subst-and-rc_expand_param">5.4.5: Flags for options: <code>GLOB_SUBST</code> and <code>RC_EXPAND_PARAM</code></a></h3>
<p>The other two flags that don't use parentheses affect options for single
substitutions, too. The second is the `<code>~</code>' flag that turns on
<code>GLOB_SUBST</code>, making the result of a parameter substitution eligible for
pattern matching. As the notation is supposed to indicate, it also makes
filename expansion possible, so</p>
<pre><code> % foo='~'
% print ${~foo}
/home/pws
</code></pre>
<p>It's that first `<code>~</code>' which is giving the home directory; the one in
the parameter expansion simply allows that to happen. If you have
<code>GLOB_SUBST</code> set, you can use `<code>${~~foo}</code>' to turn it off for one
substitution.</p>
<p>There's one other of these option flags: `<code>^</code>' forces on
<code>RC_EXPAND_PARAM</code> for the current substitution, and `<code>^^</code>' forces it
off. In <ahref="zshguide03.html#syntax">chapter 3</a>, I showed how parameters
expanded with this option on fitted in with brace expansions.</p>
<p><spanid="l126"></span></p>
<h3id="546-yet-more-parameter-flags"><aclass="header"href="#546-yet-more-parameter-flags">5.4.6: Yet more parameter flags</a></h3>
<p>Here are a few other parameter flags; I'm repeating some of these. A
very useful one is `<code>t</code>' to tell you the type of a parameter. This came
up in <ahref="zshguide03.html#syntax">chapter 3</a> as well. It's most common use
is to test the basic type of the parameter before trying to use it:</p>
<pre><code> if [[ ${(t)myparam} != *assoc* ]]; then
# $myparam is not an associative array. Do something about it.
fi
</code></pre>
<p>Another very useful type is for left or right padding of a string, to a
specified length, and optionally with a specified fill string to use
instead of space; you can even specify a one-off string to go right next
to the string in question.</p>
<pre><code> foo='abcdefghij'
for (( i = 1; i <= 10; i++ )); do
goo=${foo[1,$i]}
print ${(l:10::X::Y:)goo} ${(r:10::X::Y:)goo}
done
</code></pre>
<p>prints out the rather pretty:</p>
<pre><code> XXXXXXXXYa aYXXXXXXXX
XXXXXXXYab abYXXXXXXX
XXXXXXYabc abcYXXXXXX
XXXXXYabcd abcdYXXXXX
XXXXYabcde abcdeYXXXX
XXXYabcdef abcdefYXXX
XXYabcdefg abcdefgYXX
XYabcdefgh abcdefghYX
Yabcdefghi abcdefghiY
abcdefghij abcdefghij
</code></pre>
<p>Note that those colons (which can be other characters, as I explained
for the <code>(s)</code> and <code>(j)</code> flags) always occur in pairs before and after
the argument, so that with three arguments, the colons in between are
doubled. You can miss out the `<code>:Y:</code>' part and the `<code>:X:</code>' part and
see what happens. The fill strings don't need to be single characters;
if they don't fit an exact number of times into the filler space, the
last repetition will be truncated on the end furthest from the parameter
argument being inserted.</p>
<p>Two parameters tell the shell that you want something special done with
the value of the parameter substitution. The <code>(P)</code> flag forces the value
to be treated as a parameter name, so that you get the effect of a
double substitution:</p>
<pre><code> % final=string
% intermediate=final
% print ${(P)intermediate}
string
</code></pre>
<p>This is a bit as if <code>$intermediate</code> were what in ksh is called a
`nameref', a parameter that is marked as a reference to another
parameter. Zsh may eventually have those, too; there are places where
they are a good deal more convenient than the `<code>(P)</code>' flag.</p>
<p>A more powerful flag is <code>(e)</code>, which forces the value to be rescanned
for all forms of single-word substitution. For example,</p>
<pre><code> % foo='$(print $ZSH_VERSION)'
% print ${(e)foo}
4.0.2
</code></pre>
<p>made the value of <code>$foo</code> be re-examined, at which point the command
substitution was found and executed.</p>
<p>The remaining flags are a few simple special formatting tricks: order
array elements in normal lexical (character) order with <code>(o)</code>, order in
reverse order with <code>(O)</code>, do the same case-independently with <code>(oi)</code> or
<code>(Oi)</code> respectively, expand prompt `<code>%</code>'-escapes with <code>(%)</code> (easy to
remember), expand backslash escapes as <code>print</code> does with <code>p</code>, force all
characters to uppercase with <code>(U)</code> or lowercase with <code>(L)</code>, capitalise
the first character of the string or each array element with <code>(C)</code>, show
up special characters as escape sequences with <code>(V)</code>. That should be
enough to be getting on with.</p>
<p><spanid="l127"></span></p>
<h3id="547-a-couple-of-parameter-substitution-tricks"><aclass="header"href="#547-a-couple-of-parameter-substitution-tricks">5.4.7: A couple of parameter substitution tricks</a></h3>
<p>I can't resist describing a couple of extras.</p>
<p>Zsh can do so much on parameter expressions that sometimes it's useful
even without a parameter! For example, here's how to get the length of
a fixed string without needing to put it into a parameter:</p>
<pre><code> % print ${#:-abcdefghijklm}
13
</code></pre>
<p>If the parameter whose name you haven't given has a zero length (it
does, because there isn't one), use the string after the `<code>:-</code>'
instead, and take it's length. Note you need the colon, else you are
asking the shell to test whether a parameter is set, and it becomes
rather upset when it realises there isn't one to test. Other shells are
unlikely to tolerate any such syntactic outrages at all; the <code>#</code> in that
case is likely to be treated as <code>$#</code>, the number of shell arguments. But
zsh knows that's not going to have zero length, and assumes you know
what you're doing with the extra part; this is useful, but technically a
violation of the rules.</p>
<p>Sometimes you don't need anything more than the flags. The most useful
case is making the `fill' flags generate repeated words, with the
effect of perl's `<code>x</code>' operator (for those not familiar with perl, the
expression `<code>"string" x 3</code>' produces the string `stringstringstring'.
Here, you need to remember that the fill width you specify is the total
width, not the number of repetitions, so you need to multiply it by the
fn: bad math expression: operand expected at `: 3 '
</code></pre>
<p>The expression before the `<code>?</code>' evaluates to zero if <code>$1</code> is not
present, and you expect the expression after the colon to be used in
that case. But actually it's too late by then; the arithmetic expression
parser has received `<code>0 ? : 3</code>', which doesn't make sense to it, hence
the error. So you need to put in `<code>${1:-0}</code>' for the second <code>$1</code>, too
--- or <code>${1:-32}</code>, or any other number, since it won't be evaluated if
<code>$1</code> is empty, it just needs to be parsed.</p>
<p>You should note that just as you can put numbers into scalar parameters
without needing any special handling, you can also do all the usual
string-related tricks on numeric parameters, since there is automatic
conversion in the other direction, too:</p>
<pre><code> % float foo
% zmodload -i zsh/mathfunc
% (( foo = 4 * atan(1.0) ))
% print $foo
3.141592654e+00
% print ${foo%%.*}${foo##*.[0-9]##}
3e+00
</code></pre>
<p>The argument <code>-i</code> to <code>zmodload</code> tells it not to complain if the math
library is already loaded. This gives us access to <code>atan</code>. Remember,
`<code>float</code>' declares a parameter whose output includes an exponent ---
you can actually convert it to a fixed point format on the fly using
`<code>typeset -F foo</code>', which retains the value but alters the output type.
The substitution uses some <code>EXTENDED_GLOB</code> chicanery: the final
`<code>[0-9]##</code>' matches one or more occurrences of any decimal digit. So
the head of the string value of <code>$foo</code> up to the last digit after the
decimal point is removed, and the remainder appended to whatever appears
before the decimal point.</p>
<p>Starting from 4.1.1, a calculator function called <code>zcalc</code> is bundled
with the shell. You type a standard arithmetic expression and the shell
evaluates the formula and prints it out. Lines already entered are
prefixed by a number, and you can use the positional parameter
corresponding to that number to retrieve that result for use in a new
formula. The function uses <code>vared</code> to read the formulae, so the full
shell editing mechanism is available. It will also read in
<code>zsh/mathfunc</code> if that is present.</p>
<p><spanid="l133"></span></p>
<h2id="57-brace-expansion-and-arrays"><aclass="header"href="#57-brace-expansion-and-arrays">5.7: Brace Expansion and Arrays</a></h2>
<p>Brace expansion, which you met in <ahref="zshguide03.html#syntax">chapter 3</a>,
appears in all csh derivatives, in some versions of ksh, and in bash, so
is fairly standard. However, there are some features and aspects of it
which are only found in zsh, which I'll describe here.</p>
<p>A complication occurs when arrays are involved. Normally, unquoted
arrays are put into a command line as if there is a break between
arguments when there is a new element, so</p>
<pre><code> % array=(three separate words)
% print -l before${array}after
beforethree
separate
wordsafter
</code></pre>
<p>unless the <code>RC_EXPAND_PARAM</code> option is set, which combines the before
and after parts with <em>each</em> element, so you get:</p>
<pre><code> % print -l before${^array}after
beforethreeafter
beforeseparateafter
beforewordsafter
</code></pre>
<p>--- the `<code>^</code>' character turns on the option just for that expansion,
as `<code>=</code>' does with <code>SH_WORD_SPLIT</code>. If you think of the character as a
correction to a proof, meaning `insert a new word between the others
here', it might help you remember (this was suggested by Bart Schaefer).</p>
<p>These two ways of expanding arrays interact differently with braces; the
more useful version here is when the <code>RC_EXPAND_PARAM</code> option is on.
Here the array acts as sort of additional nesting:</p>
<pre><code> % array=(two three)
% print X{one,${^array}}Y
XoneY XtwoY XoneY XthreeY
</code></pre>
<p>with the <code>XoneY</code> tacked on each time, but because of the braces it
appears as a separate word, so there are four altogether.</p>
<p>If <code>RC_EXPAND_PARAM</code> is not set, you get something at first sight
slightly odd:</p>
<pre><code> % array=(two three)
% print X{one,$array}Y
X{one,two three}Y
</code></pre>
<p>What has happened here is that the <code>$array</code> has produced two words; the
first has `<code>X{one,</code>' tacked in front of the array's `<code>two</code>', while the
second likewise has `<code>}Y</code>' on the end of the array's `<code>three</code>'. So by
the time the shell comes to think about brace expansion, the braces are
in different words and don't do anything useful.</p>
<p>There's no obvious simple way of forcing the <code>$array</code> to be embedded in
the braces at the same level, instead of like an additional set of
braces. There are more complicated ways, of course.</p>
<pre><code> % array=(two three)
% print X${^=:-one $array}Y
XoneY XtwoY XthreeY
</code></pre>
<p>Yuk. We gave parameter substitution a string of words, the array with
<code>one</code> stuck in front, and told it to split them on spaces (this will
split on any extra spaces in elements of <code>$array</code>, unfortunately), while
setting <code>RC_EXPAND_PARAM</code>. The parameter flags are `<code>^=</code>'; the `<code>:-</code>'
is the usual `insert the following if the substitution has zero length'
operator. It's probably better just to create your own temporary array
and apply <code>RX_EXPAND_PARAM</code> to that. By the way, if you had
<code>RC_EXPAND_PARAM</code> set already, the last result would have been different
becuase the embedded <code>$array</code> would have been expanded together with the
`<code>one </code>' in front of it.</p>
<p>Braces allow numeric expressions; this works a little like in Perl:</p>
<pre><code> % print {1..10}a
1a 2a 3a 4a 5a 6a 7a 8a 9a 10a
</code></pre>
<p>and you can ask the numbers to be padded with zeroes:</p>
<pre><code> % print {01..10}b
01b 02b 03b 04b 05b 06b 07b 08b 09b 10b
</code></pre>
<p>or have them in descending order:</p>
<pre><code> % print {10..1}c
10c 9c 8c 7c 6c 5c 4c 3c 2c 1c
</code></pre>
<p>Nesting this within other braces works in the expected way, but you
can't have any extra braces inside: the syntax is fixed to number, two
dots, number, and the numbers must be positive.</p>
<p>There's also an option <code>BRACE_CCL</code> which, if the braces aren't in either
of the above forms, expands single letters and ranges of letters:</p>
<pre><code> % setopt braceccl
% print 1{abw-z}2
1a2 1b2 1w2 1x2 1y2 1z2
</code></pre>
<p>An important point to be made about braces is that they are <em>not</em> part
of filename generation; they have nothing to do with pattern matching at
all. The shell blindly generates all the arguments you specify. If you
want to generate only some arguments, depending on what files are
matched, you should use the alternative-match syntax. Compare:</p>
<pre><code> % ls
file1
% print file(1|2)
file1
% print file{1,2}
file1 file2
</code></pre>
<p>The first matches any of `<code>file1</code>' or `<code>file2</code>' it happens to find in
the directory (regardless of other files). The second doesn't look at
files in the directory at all; it simply expands the braces according to
the rules given above.</p>
<p>This point is particularly worthy of note if you have come from a
C-shell world, or use the <code>CSH_NULL_GLOB</code> option:</p>
<pre><code> csh% echo file{1,2}
file1 file2
csh% echo f*{1,2}
file1
</code></pre>
<p>(`<code>csh%</code>' is the prompt, to remind you if you're skipping through
without reading the text), where the difference occurs because in the
first case there was no pattern, so brace expansion was done on ordinary
words, while in the second case the `<code>*</code>' made pattern expansion
happen. In zsh, the sequence would be: `<code>f*{1,2}</code>' becomes `<code>f*1 f*2</code>'; the first becomes <code>file1</code> and the second fails to match. With
<code>CSH_NULL_GLOB</code> set, the failed match is simply removed; there is no
error because one pattern has succeeded in matching. This is presumably
the logic usually followed by the C shell. If you stick with
`<code>file(1|2)</code>' and `<code>f*(1|2)</code>' --- in this case you can simplify them
to `<code>file[12]</code>' and `<code>f*[12]</code>', but that's not true if you have more
than one character in either branch --- you are protected from this
<p>Another very widely used zsh enhancement is the ability to select types
of file by using `glob qualifiers', a group of (rather terse) flags in
parentheses at the end of the pattern. Like recursive globbing, this
feature only applies for filename generation in the command line
(including an array assignment), not for other uses of patterns.</p>
<p>This feature requires the <code>BARE_GLOB_QUAL</code> option to be turned on, which
it usually is; the name implies that one day there may be another,
perhaps more ksh-like, way of doing the same thing with a more
indicative syntax than just a pair of parentheses.</p>
<p><strong>File types</strong></p>
<p>The simplest glob qualifiers are similar to what the completion system
appends at the end of file names when the <code>LIST_TYPES</code> option is on;
these are in turn similar to the indications used by `<code>ls -F</code>'. So</p>
<pre><code> % print *(.)
file1 file2 cmd1 cmd2
% print *(/)
dir1 dir2
% print *(*)
cmd1 cmd2
% print *(@)
symlink1 symlink2
</code></pre>
<p>where I've invented unlikely filenames with obvious types. <code>file1</code> and
<code>file2</code> were supposed to be just any old file; <code>(.)</code> picks up those but
also executable files. Sockets <code>(=)</code>, named pipes <code>(p)</code>, and device
files <code>(%)</code> including block <code>(%b)</code> and character <code>(%c)</code> special files
are the other types of file you can detect.</p>
<p>Associated with type, you can also specify the number of hard links to a
file: <code>(l2)</code> specifies exactly 2 links, <code>(l+3)</code> more than 3 links,
<code>(l-5)</code> fewer than 5.</p>
<p><strong>File permissions</strong></p>
<p>Actually, the <code>(*)</code> qualifier really applies to the file's permissions,
not it's type, although it does require the file to be an executable
non-special file, not a directory nor anything wackier. More basic
qualifiers which apply just to the permissions of the files are <code>(r)</code>,
<code>(w)</code> and <code>(x)</code> for files readable, writeable and executable by the
owner; <code>(R)</code>, <code>(W)</code> and <code>(X)</code> correspond to those for world permissions,
while <code>(A)</code>, <code>(I)</code> and <code>(E)</code> do the job for group permissions --- sorry,
the Latin alphabet doesn't have middle case. You can speciy permissions
more exactly with `<code>(f)</code>' for file permissions: the expression after
this can take various forms, but the easiest is probably a delimited
string, where the delimiters work just like the arguments for parameter
flags and the arguments, separated by commas, work just like symbolic
arguments to <code>chmod</code>; the example from the manual,</p>
<pre><code> print *(f:gu+w,o-rx:)
</code></pre>
<p>picks out files (of any type) which are writeable by the owner (`user')
and group, and neither readable nor executable by anyone else
(`other').</p>
<p><strong>File ownership</strong></p>
<p>You can match on the other three mode bits, setuid ((s)), setgid ((S))
and sticky ((t)), but I'm not going to go into what those are if you
don't know; your system's manual page for <code>chmod</code> may (or may not)
explain.</p>
<p>Next, you can pick out files by owner; <code>(U)</code> and <code>(G)</code> say that you or
your group, respectively, owns the file --- really the effective user or
group ID, which is usually who you are logged in as, but this may be
altered by tricks such as a programme running setuid or setgid (the
things I'm not going to explain). More generally, <code>u0</code> says that the
file is owned by root and <code>(u501)</code> says it is owned by user ID 501; you
can use names if you delimiit them, so <code>(u:pws:)</code> says that the owner
must be user <code>pws</code>; similarly for groups with <code>(g)</code>.</p>
<p><strong>File times</strong></p>
<p>You can also pick files by modification ((m)) or access ((a)) time,
either before ((-)), at, or after ((+)) a specific time, which may be
measured in days (the default), months ((M)), weeks ((w)), hours ((h)),
minutes ((m)) or seconds ((s)). These must appear in the order <code>m</code> or
<code>a</code>, optional unit, optional plus or minus, number. Hence:</p>
<pre><code> print *(m1)
</code></pre>
<p>Files that were modified one day ago --- i.e. less than 48 but more than
24 hours ago.</p>
<pre><code> print *(aw-1)
</code></pre>
<p>Files accessed within the last week, i.e. less than 7 days ago.</p>
<p>In addition to <code>(m)</code> and ((a)), there is also <code>(c)</code>, which is sometimes
said to refer to file creation, but it is actually something a bit less
useful, namely <em>inode</em> change. The inode is the structure on disk where
UNIX-like filing systems record the information about the location and
nature of the file. Information here can change when some aspect of the
file information, such as permissions, changes.</p>
<p><strong>File size</strong></p>
<p>The qualifier <code>(L)</code> refers to the file size (`L' is actually for
length), by default in bytes, but it can be in kilobytes <code>(k)</code>,
megabytes <code>(m)</code>, or 512-byte blocks <code>(p, unfortunately)</code>. Plus and minus
can be used in the same way as for times, so</p>
<pre><code> print *(Lk3)
</code></pre>
<p>gives files 3k large, i.e. larger than 2k but smaller than 4k, while</p>
<pre><code> print *(Lm+1)
</code></pre>
<p>gives files larger than a megabyte.</p>
<p>Note that file size applies to directories, too, although it's not very
useful. The size of directories is related to the number of slots for
files currently available inside the directory (at the highest level,
i.e. not counting children of children and deeper). This changes
automatically if necessary to make more space available.</p>
<p><strong>File matching properties</strong></p>
<p>There are a few qualifiers which affect option settings just for the
match in question: <code>(N)</code> turns on <code>NULL_GLOB</code>, so that the pattern
simply disappears from the command line if it fails to match; <code>(D)</code>
turns on <code>GLOB_DOTS</code>, to match even files beginning with a `<code>.</code>', as
described above; <code>(M)</code> or <code>(T)</code> turn on <code>MARK_DIRS</code> or <code>LIST_TYPES</code>, so
that the result has an extra character showing the type of a directory
only (in the first case) or of any special file (in the second); and
<code>(n)</code> turns on <code>NUMERIC_GLOB_SORT</code>, so that numbers in the filename are
sorted arithmetically --- so <code>10</code> comes after <code>1A</code>, because the 1 and 10
are compared before the next character is looked at.</p>
<p>Other than being local to the pattern qualified, there is no difference
in effect from setting the option itself.</p>
<p><strong>Combining qualifiers</strong></p>
<p>One of the reasons that some qualifiers have slightly obscure syntax is
that you can chain any number of them together, which requires that the
file has all of the given properties. In other words `<code>*(UWLk-10)</code>' are
files owned by you, world writeable and less than 10k in size.</p>
<p>You can negate a set of qualifiers by putting `<code>^</code>' in front of those,
so `<code>*(ULk-10^W)</code>' would specify the corresponding files which were not
world writeable. The `<code>^</code>' applies until the end of the flags, but you
can put in another one to toggle back to assertion instead of negation.</p>
<p>Also, you can specify alternatives; `<code>*(ULk-10,W)</code>' are files which
either are owned by you and are less than 10k, or are world writeable
--- note that the `and' has higher precedence than the `or'.</p>
<p>You can also toggle whether the assertions or negations made by
qualifiers apply to symbolic links, or the files found by following
symbolic links. The default is the former --- otherwise the <code>(@)</code>
qualifier wouldn't work on its own. By preceding qualifiers with <code>-</code>,
they will follow symbolic links. So <code>*(-/)</code> matches all directories,
including those reached by a symbolic link (or more than one symbolic
link, up to the limit allowed by your system). As with `<code>^</code>', you can
toggle this off again with another one `<code>-</code>'. To repeat what I said in
<ahref="zshguide03.html#syntax">chapter 3</a>, you can't distinguish between the
other sort of links, hard links, and a real file entry, because a hard
link just supplies an alternative but equivalent name for a file.</p>
<p>There's a nice trick to find broken symlinks: the pattern `<code>**/*(-@)</code>'.
This is supposed to follow symlinks; but that `<code>@</code>' tells it to match
only on symlinks! There is only one case where this can succeed, namely
where the symlink is broken. (This was pointed out to me by Oliver
Kiddle.)</p>
<p><strong>Sorting and indexing qualifiers</strong></p>
<p>Normally the result of filename generation is sorted by alphabetic order
of filename. The globbing flags <code>(o)</code> and <code>(O)</code> allow you to sort in
normal or reverse order of other things: <code>n</code> is for names, so <code>(on)</code>
gives the default behaviour while <code>(On)</code> is reverse order; <code>L</code>, <code>l</code>,
<code>m</code>, <code>a</code> and <code>c</code> refer to the same thing as the normal flags with those
letters, i.e. file size, number of links, and modification, access and
inode change times. Finally, <code>d</code> refers to subdirectory depth; this is
useful with recursive globbing to show a file tree ordered depth-first
(subdirectory contents appear before files in any given directory) or
depth-last.</p>
<p>Note that time ordering produces the most recent first as the standard
ordering (<code>(om)</code>, etc.), and oldest first as the reverse ordering
<code>(OM)</code>, etc.). With size, smallest first is the normal ordering.</p>
<p>You can combine ordering criteria, with the most important coming first;
each criterion must be preceded by <code>o</code> or <code>O</code> to distinguish it from an
ordinary globbing flag. Obviously, <code>n</code> serves as a complete
discriminator, since no two different files can have the same name, so
this must appear on its own or last. But it's a good idea, when doing
depth-first ordering, to use <code>odon</code>, so that files at a particular depth
appear in alphabetical order of names. Try</p>
<pre><code> print **/*(odon)
</code></pre>
<p>to see the effect, preferably somewhere above a fairly shallow directory
tree or it will take a long time.</p>
<p>There's an extra trick you can play with ordered files, which is to
extract a subset of them by indexing. This works just like arrays, with
individual elements and slices.</p>
<pre><code> print *([1])
</code></pre>
<p>This selects a single file, the first in alphabetic order since we
haven't changed the default ordering.</p>
<pre><code> print *(om[1,5])
</code></pre>
<p>This selects the five most recently modified files (or all files, if
there are five or fewer). Negative indices are understood, too:</p>
<pre><code> print *(om[1,-2])
</code></pre>
<p>selects all files but the oldest, assuming there are at least two.</p>
<p>Finally, a reminder that you can stick modifiers after qualifiers, or
indeed in parentheses without any qualifiers:</p>
<pre><code> print **/*(On:t)
</code></pre>
<p>sorts files in subdirectories into reverse order of name, but then
strips off the directory part of that name. Modifiers are applied right
at the end, after all file selection tasks.</p>
<p><strong>Evaluating code as a test</strong></p>
<p>The most complicated effect is produced by the <code>(e)</code> qualifer. which is
followed by a string delimited in the now-familiar way by either
matching brackets of any of the four sorts or a pair of any other
characters. The string is evaluated as shell code; another layer of
quotes is stripped off, to make it easier to quote the code from
immediate expansion. The expression is evaulated separately for each
match found by the other parts of the pattern, with the parameter
<code>$REPLY</code> set to the filename found.</p>
<p>There are two ways to use <code>(e)</code>. First, you can simply rely on the
return code. So:</p>
<pre><code>
print *(e:'[[ -d $REPLY ]]':)
print *(/)
</code></pre>
<p>are equivalent. Note that quotes around the expression, which are
necessary in addition to the delimiters (here `<code>:</code>') for expressions
with special characters or whitespace. In particular, <code>$REPLY</code> would
have been evaluated too early --- before file generation took place ---
if it hadn't been quoted.</p>
<p>Secondly, the function can alter the value of <code>$REPLY</code> to alter the name
of the file. What's more, the expression can set <code>$reply</code> (which
overrides the use of <code>$REPLY</code>) to an array of files to be inserted into
the command line; it may be any size from zero items upward.</p>
<p>Here's the example in the manual:</p>
<pre><code> print *(e:'reply=(${REPLY}{1,2})':)
</code></pre>
<p>Note the string is delimited by colons <em>and</em> quoted. This takes each
file in the current directory, and for each returns a match which has
two entires, the filename with `<code>1</code>' appended and the filename with
`<code>2</code>' appended.</p>
<p>For anything more complicated than this, you should write a shell
function to use <code>$REPLY</code> and set that or <code>$reply</code>. Then you can replace
the whole expression in quotes with that name.</p>
<p><spanid="l142"></span></p>
<h3id="597-globbing-flags-alter-the-behaviour-of-matches"><aclass="header"href="#597-globbing-flags-alter-the-behaviour-of-matches">5.9.7: Globbing flags: alter the behaviour of matches</a></h3>
<p>Another <code>EXTENDED_GLOB</code> features is `globbing flags'. These are a bit
like the flags that can appear in perl regular expressions; instead of
making an assertion about the type of the resulting match, like glob
qualifiers do, they affect the way the match is performed. Thus they are
available for all uses of pattern matching --- though some flags are not
particularly useful with filename generation.</p>
<p>The syntax is borrowed from perl, although it's not the same: it looks
like `<code>(#X)</code>', where <code>X</code> is a letter, possibily followed by an argument
(currently only a number and only if the letter is `<code>a</code>'). Perl
actually uses `<code>?</code>' instead of `<code>#</code>'; what these have in common is
that they can't appear as a valid pattern characters just after an open
parenthesis, since they apply to the pattern before. Zsh doesn't have
the rather technical flags that perl does (lookahead assertions and so
on); not surprisingly, its features are based around the shortcuts often
required by shell users.</p>
<p><strong>Mixed-case matches</strong></p>
<p>The simplest sort of globbing flag will serve as an example. You can
make a pattern, or a portion of a pattern, match case-insensitively with
the flag <code>(#i)</code>:</p>
<pre><code> [[ FOO = foo ]]
[[ FOO = (#i)foo ]]
</code></pre>
<p>Assuming you have <code>EXTENDED_GLOB</code> set so that the `<code>#</code>' is an active
pattern character, the first match fails while the second succeeds. I
mentioned portions of a pattern. You can put the flags at any point in
the pattern, and they last to the end either of the pattern or any
enclosing set of parentheses, so in</p>
<pre><code> [[ FOO = f(#i)oo ]]
[[ FOO = F(#i)oo ]]
</code></pre>
<p>once more the first match fails and the second succeeds. Alternatively,
you can put them in parentheses to limit their scope:</p>
<pre><code> [[ FOO = ((#i)fo)o ]]
[[ FOO = ((#i)fo)O ]]
</code></pre>
<p>gives a failure then a success again. Note that you need extra
parentheses; the ones around the flag just delimit that, and have no
grouping effect. This is different from Perl.</p>
<p>There are two flags which work in exactly the same way: <code>(#l)</code> says that
only lowercase letters in the pattern match case-insensitively;
uppercase letters in the pattern only match uppercase letters in the
test string. This is a little like Emacs' behaviour when searching case
insensitvely with the <code>case-fold-search</code> option variable set; if you
type an uppercase character, it will look only for an uppercase
character. However, Emacs has the additional feature that from that
point on the whole string becomes case-sensitive; zsh doesn't do that,
the flag applies strictly character by character.</p>
<p>The third flag is <code>(#I)</code>, which turns case-insensitive matching off from
that point on. You won't often need this, and you can get the same
effect with grouping --- unless you are applying the case-insensitive
flag to multiple directories, since groups can't span more than one
directory. So</p>
<pre><code> print (#i)/a*/b*/(#I)c*
</code></pre>
<p>is equivalent to</p>
<pre><code> print /[aA]*/[bB]*/c*
</code></pre>
<p>Note that case-insensitive searching only applies to characters not in a
special pattern of some sort. In particular, ranges are not
automatically made case-insensitive; instead of `<code>(#i)[ab]*</code>', you must
use `<code>[abAB]*</code>'. This may be unexpected, but it's consistent with how
other flags, notably approximation, work.</p>
<p>You should be careful with matching multiple directories
case-insensitively. First,</p>
<pre><code> print (#i)~/.Z*
</code></pre>
<p>doesn't work. This is due to the order of expansions: filename expansion
of the tilde happens before pattern matching is ever attempted, and the
`<code>~</code>' isn't at the start where filename expansion needs to find it.
It's interpreted as an empty string which doesn't match `<code>/.Z*</code>',
case-insensitively --- in other words, it will match any empty string.</p>
<p>Hence you should put `<code>(#i)</code>' and any other globbing flags after the
first slash --- unless, for some reason, you <em>really</em> want the
expression to match `<code>/Home/PWS/</code>' etc. as well as `<code>/home/pws</code>'.</p>
<p>Second,</p>
<pre><code> print (#i)$HOME/.Z*
</code></pre>
<p>does work --- prints all files beginning `<code>.Z</code>' or `<code>.z</code>' in your home
directory --- but is inefficient. Assume <code>$HOME</code> expands to my home
directory, <code>/home/pws</code>. Then you are telling the shell it can match in
the directories `<code>/Home/PWS/</code>', `<code>/HOME/pWs</code>' and so on. There's no
quick way of doing this --- the shell has to look at every single entry
first in `<code>/</code>' and then in `<code>/home</code>' (assuming that's the only match
at that level) to check for matches. In summary, it's a good idea to use
the fact that the flag doesn't have to be at the beginning, and write
this as:</p>
<pre><code> print ~/(#i).Z*
</code></pre>
<p>Of course,</p>
<pre><code> print ~/.[zZ]*
</code></pre>
<p>would be easier and more standard in this oversimplified example.</p>
<p>On <code>Cygwin</code>, a UNIX-like layer running on top of, uh, a well known
graphical user interface claiming to be an operating system, filenames
are usually case insensitive anyway. Unfortunately, while Cygwin itself
is wise to this fact, zsh isn't, so it will do all that extra searching
when you give it the <code>(#i)</code> flag with an otherwise explicit string.</p>
<p>A piece of good news, however, is that matching of uppercase and
lowercase characters will handle non-ASCII character sets, provided your
system handles locales, (or to use the standard hieroglyphics, `i18n'
--- count the letters between `i' and `n' in `internationalization',
which may not even be a word anyway, and wince). In that case you or
your system administrator or the shell environment supplied by your
operating system vendor needs to set <code>$LC_ALL</code> or <code>$LC_CTYPE</code> to the
appropriate locale -- C for the default, <code>en</code> for English, <code>uk</code> for
Ukrainian (which I remember because it's confusing in the United
Kingdom), and so on.</p>
<p><strong>`Backreferences'</strong></p>
<p>The feature labelled as `backreferences' in the manual isn't really
that at all, which is my fault. Many regular expression matchers allow
you to refer back to bits already matched. For example, in Perl the
regular expression `<code>([A-Z]{3})$1</code>' says `match three uppercase
characters followed by the same three characters again. The `<code>$1</code>' is a
backreference.</p>
<p>Zsh has a similar feature, but in fact you can't use it while matching a
single pattern; it just makes the characters matched by parentheses
available after a successful complete match. In this, it's a bit more
like Emacs's <code>match-beginning</code> and <code>match-end</code> functions.</p>
<p>You have to turn it on for each pattern with the globbing flag
`<code>(#b)</code>'. The reason for this is that it makes matches involving
parentheses a bit slower, and most of the time you use parentheses just
for ordinary filename generation where this feature isn't useful. Like
most of the other globbing flags, it can have a local effect: only
parentheses after the flag produce backreferences, and the effect is
local to enclosing parentheses (which don't feel the effect themselves).
You can also turn it off with `<code>(#B)</code>'.</p>
<p>What happens when a pattern with active parentheses matches is that the
elements of the array <code>$match</code>, <code>$mbegin</code> and <code>$mend</code> are set to reflect
each active parenthesis in turn --- names inspired by the corresponding
Emacs feature. The string matching the first pair of parentheses is
stored in the first element of <code>$match</code>, its start position in the
string is stored in the first element of <code>$mbegin</code>, and its end position
in the string <code>$mend</code>. The same happens for later matched parentheses.
The parentheses around any globbing flags do not count.</p>
<p><code>$mbegin</code> and <code>$mend</code> use the indexing convention currently in effect,
i.e. zero offset if <code>KSH_ARRAYS</code> is set, unit offset otherwise. This
means that if the string matched against is stored in the parameter
<code>$teststring</code>, then it will always be true that <code>${match[1]}</code> is the
same string as <code>${teststring[${mbegin[1]},${mend[1]}]}</code>. and so on. (I'm
assuming, as usual, that <code>KSH_ARRAYS</code> isn't set.) Unfortunately, this is
different from the way the <code>E</code> parameter flag works --- that substitutes
the character after the end of the matched substring. Sorry! It's my
fault for not following that older convention; I thought the string
subscripting convention was more relevant.</p>
<p>An obvious use for this is to match directory and non-directory parts of
a filename:</p>
<pre><code> local match mbegin mend
if [[ /a/file/name = (#b)(*)/([^/]##) ]]; then
print -l ${match[1]} ${match[2]}
fi
</code></pre>
<p>prints `<code>/a/file</code>' and `<code>name</code>'. The second parenthesis matches a
slash followed by any number of characters, but at least one, which are
not slashes, while the first matches anything --- remember slashes
aren't special in a pattern match of this form. Note that if this
appears in a function, it is a good idea to make the three parameters
local. You don't have to clear them, or even make them arrays. If the
match fails, they won't be touched.</p>
<p>There's a slightly simpler way of getting information about the match:
the flag <code>(#m)</code> puts the matched string, the start index, and the index
for the <em>whole</em> match into the scalars <code>$MATCH</code>, <code>$MBEGIN</code> and <code>$MEND</code>.
It may not be all that obvious why this is useful. Surely the whole
pattern always matches the whole string? Actually, you've already seen
cases where this isn't true for parameter substitutions:</p>
<pre><code> local MATCH MBEGIN MEND string
string=aLOha
: ${(S)string##(#m)([A-Z]##)}
</code></pre>
<p>You'll find this sets <code>$MATCH</code> to <code>LO</code>, <code>$MBEGIN</code> to 2 and <code>$MEND</code> to 3.
In the parameter expansion, the <code>(S)</code> is for matching substrings, so
that the `<code>##</code>' match isn't anchored to the start of <code>$string</code>. The
pattern is <code>(#m)([A-Z]##)</code>, which means: turn on full-match
backreferencing and match any number of capital letters, but at least
one. This matches <code>LO</code>. Then the match parameters let you see where in
the test parameter the match occurred.</p>
<p>There's nothing to stop you using both these types of backreferences at
once, and you can specify multiple globbing flags in the short form
`<code>(#bm)</code>'. This will work with any combination of flags, except that
some such as `<code>(#bB)</code>' are obviously silly.</p>
<p>Because ordinary globbing produces a list of files, rather than just
one, this feature isn't very useful and is turned off. However, it <em>is</em>
possible to use backreferences in global substitutions and substitutions
on arrays; here are both at once:</p>
<pre><code> % array=(mananan then in gone June)
% print ${array//(#m)?n/${(C)MATCH[1]}n}
mAnAnAn thEn In gOne JUne
</code></pre>
<p>The substitution occurs separately on each element of the array, and at
each match in each element <code>$MATCH</code> gets set to what was matched. We use
this to capitalize every character that is followed by a lowercase
`<code>n</code>'. This will work with the <code>(#b)</code> form, too. The perl equivalent of