Table of Contents generated with DocToc
- Chapter 3: Dealing with basic shell syntax
- 3.1: External commands
- 3.2: Builtin commands
- 3.2.1: Builtins for printing
- 3.2.2: Other builtins just for speed
- 3.2.3: Builtins which change the shell's state
- 3.2.4: cd and friends
- 3.2.5: Command control and information commands
- 3.2.6: Parameter control
- 3.2.7: History control commands
- 3.2.8: Job control and process control
- 3.2.9: Terminals, users, etc.
- 3.2.10: Syntactic oddments
- 3.2.11: More precommand modifiers:
exec
,noglob
- 3.2.12: Testing things
- 3.2.13: Handling options to functions and scripts
- 3.2.14: Random file control things
- 3.2.15: Don't watch this space, watch some other
- 3.2.16: And also
- 3.3: Functions
- 3.4: Aliases
- 3.5: Command summary
- 3.6: Expansions and quotes
- 3.7: Redirection: greater-thans and less-thans
- 3.8: Shell syntax: loops, (sub)shells and so on
- 3.9: Emulation and portability
- 3.10: Running scripts
Chapter 3: Dealing with basic shell syntax
This chapter is a more thorough examination of much of what appeared in the chapter 2; to be more specific, I assume you're sitting in front of your terminal about to use the features you just set up in your initialisation files and want to know enough to get them going. Actually, you will probably spend most of the time editing command lines and in particular completing commands --- both of these activities are covered in later chapters. For now I'm going to talk about commands and the syntax that goes along with using them. This will let you write shell functions and scripts to do more of your work for you.
In the following there are often several consecutive paragraphs about quite minor features. If you find you read this all through the first time, maybe you need to get out more. Most people will probably find it better to skim through to find what the subject matter is, then come back if they later find they want to know more about a particular aspect of the shell's commands and syntax.
One aspect of the syntax is left to chapter 5: there's just so much to it, and it can be so useful if you know enough to get it right, that it can't all be squashed in here. The subject is expansion, covering a multitude of things such as parameter expansion, globbing and history expansions. You've already met the basics of these in chapter 2; but if you want to know how to pick a particular file with a globbing expression with pinpoint accuracy, or how to make a single parameter expansion reduce a long expression to the words you need, you should read that chapter; it's more or less self-contained, so you don't necessarily need to know everything in this one.
We start with the most basic issue in any command line interpreter, running commands. As you know, you just type words separated by spaces, where the first word is a command and the remainder are arguments to it. It's important to distinguish between the types of command.
3.1: External commands
External commands are the easiest, because they have the least interaction with the shell --- many of the commands provided by the shell itself, which are described in the next section, are built into the shell especially to avoid this difficulty.
The only major issue is therefore how to find them. This is done through
the parameters $path
and $PATH
, which, as I described in chapter
2, are tied together because although the first
one is more useful inside the shell --- being an array, its various
parts can be manipulated separately --- the second is the one that is
used by other commands called by the shell; in the jargon, $PATH
is
`exported to the environment', which means exactly that other commands
called by the shell can see its value.
So suppose your $path
contains
/home/pws/bin /usr/local/bin /bin /usr/bin
and you try to run `ls
'. The shell first looks in /home/pws/bin
for
a command called ls
, then in /usr/local/bin
, then in /bin
, where
it finds it, so it executes /bin/ls
. Actually, the operating system
itself knows about paths if you execute a command the right way, so the
shell doesn't strictly need to.
There is a subtlety here. The shell tries to remember where the commands
are, so it can find them again the next time. It keeps them in a
so-called `hash table', and you find the word `hash' all over the
place in the documentation: all it means is a fast way of finding some
value, given a particular key. In this case, given the name of a
command, the shell can find the path to it quickly. You can see this
table, in the form `key=
value', by typing `hash
'.
In fact the shell only does this when the option HASH_CMDS
is set, as
it is by default. As you might expect, it stops searching when it finds
the directory with the command it's looking for. There is an extra
optimisation in the option HASH_ALL
, also set by default: when the
shell scans a directory to find a command, it will add all the other
commands in that directory to the hash table. This is sensible because
on most UNIX-like operating systems reading a whole lot of files in the
same directory is quite fast.
The way commands are stored has other consequences. In particular, zsh
won't look for a new command if it already knows where to find one. If I
put a new ls
command in /usr/local/bin
in the above example, zsh
would continue to use /bin/ls
(assuming it had already been found). To
fix this, there is the command rehash
, which actually empties the
command hash table, so that finding commands starts again from scratch.
Users of csh may remember having to type rehash
quite a lot with new
commands: it's not so bad in zsh, because if no command was already
hashed, or the existing one disappeared, zsh will automatically scan the
path again; furthermore, zsh performs a rehash
of its own accord if
$path
is altered. So adding a new duplicate command somewhere towards
the head of $path
is the main reason for needing rehash
.
One thing that can happen if zsh hasn't filled its command hash table
and so doesn't know about all external commands is that the AUTO_CD
option, mentioned in the previous chapter and again below, can think you
are trying to change to a particular directory with the same name as the
command. This is one of the drawbacks of AUTO_CD
.
To be a little bit more technical, it's actually not so obvious that
command hashing is needed at all; many modern operating systems can find
commands quickly without it. The clincher in the case of zsh is that the
same hash table is necessary for command completion, a very commonly
used feature. If you type `compr<TAB>
', the shell completes this to
`compress
'. It can only do this if it has a list of commands to
complete, and this is the hash table. (In this case it didn't need to
know where to find the command, just its name, but it's only a little
extra work to store that too.) If you were following the previous
paragraphs, you'll realise zsh doesn't necessarily know all the
possible commands at the time you hit TAB
, because it only looks when
it needs to. For this purpose, there is another option, HASH_LIST_ALL
,
again set by default, which will make sure the command hash table is
full when you try to complete a command. It only needs to do this once
(unless you alter $path
), but it does mean the first command
completion is slow. If HASH_LIST_ALL
is not set, command completion is
not available: the shell could be rewritten to search the path
laboriously every single time you try to complete a command name, but it
just doesn't seem worth it.
The fact that $PATH
is passed on from the shell to commands called
from it (strictly only if the variable is marked for export, as it
usually is --- this is described in more detail with the typeset
family of builtin commands below) also has consequences. Some commands
call subcommands of their own using $PATH
. If you have that set to
something unusual, so that some of the standard commands can't be found,
it could happen that a command which is found nonetheless doesn't run
properly because it's searching for something it can't find in the path
passed down to it. That can lead to some strange and confusing error
messages.
One important thing to remember about external commands is that the shell continues to exist while they are running; it just hangs around doing nothing, waiting for the job to finish (though you can tell it not to, as we'll see). The command is given a completely new environment in which to run; changes in that don't affect the shell, which simply starts up where it left off after the command has run. So if you need to do something which changes the state of the shell, an external command isn't good enough. This brings us to builtin commands.
3.2: Builtin commands
Builtin commands, or builtins for short, are commands which are part of
the shell itself. Since builtins are necessary for controlling the
shell's own behaviour, introducing them actually serves as an
introduction to quite a lot of what is going on in the shell. So a fair
fraction of what would otherwise appear later in the chapter has
accumulated here, one way or another. This does make things a little
tricksy in places; count how many times I use the word `subtle
' and
keep it for your grandchildren to see.
I just described one reason for builtins, but there's a simpler one: speed. Going through the process of setting up an entirely new environment for the command at the beginning, swapping between this command and anything else which is being run on the computer, then destroying it again at the end is considerable overkill if all you want to do is, say, print out a message on the screen. So there are builtins for this sort of thing.
3.2.1: Builtins for printing
The commands `echo
' and `print
' are shell builtins; they just show
what you typed, after the shell has removed all the quoting. The
difference between the two is really historical: `echo
' came first,
and only handled a few simple options; ksh provided `print
', which
had more complex options and so became a different command. The
difference remains between the two commands in zsh; if you want wacky
effects, you should look to print
. Note that there is usually also an
external command called echo
, which may not be identical to zsh's;
there is no standard external command called print
, but if someone has
installed one on your system, the chances are it sends something to the
printer, not the screen.
One special effect is `print -z
' puts the arguments onto the editing
buffer stack, a list maintained by the shell of things you are about to
edit. Try:
print -z print -z print This is a line
(it may look as if something needs quoting, but it doesn't) and hit
return three times. The first time caused everything after the first
`print -z
' to appear for you to edit, and so on.
For something more useful, you can write functions that give you a line to edit:
fn() { print -z print The time now is $(date); }
Now when you type `fn
', the line with the date appears on the command
line for you to edit. The option `-s
' is a bit similar; the line
appears in the history list, so you will see it if you use up-arrow, but
it doesn't reappear automatically.
A few other useful options, some of which you've already seen, are
-r
don't interpret special character sequences like `\n
'-P
use `%
' as in prompts-n
don't put a newline at the end in case there's more output to follow-c
print the output in columns --- this means that `print -c *
' has the effect of a sort of poor person's `ls
', only faster-l
use one line per argument instead of one column, which is sometimes useful for sticking lists into files, and for working out what part of an array parameter is in each element.
If you don't use the -r
option, there are a whole lot of special
character sequences. Many of these may be familiar to you from C.
\n
newline\t
tab\e
or\E
escape character\a
ring the bell (alarm), usually a euphemism for a hideous beep\b
move back one character.\c
don't print a newline --- like the-n
option, but embedded in the string. This alternative comes from Berkeley UNIX.\f
form feed, the phrase for `advance to next page' from the days when terminals were called teletypes, maybe more familiar to you as^L
\r
carriage return --- when printed, the annoying^M
's you get in DOS files, but actually rather useful with `print
', since it will erase everything to the start of the line. The combination of the-n
option and a\r
at the start of the print string can give the illusion of a continuously changing status line.\v
vertical tab, which I for one have never used (I just tried it now and it behaved like a newline, only without assuming a carriage return, but that's up to your terminal).
In fact, you can get any of the 255 characters possible, although your
terminal may not like some or all of the ones above 127, by specifying a
number after the backslash. Normally this consists of three octal
characters, but you can use two hexadecimal characters after \x
instead --- so `\n
', `\012
' and `\x0a
' are all newlines. `\
'
itself escapes any other character, i.e. they appear as themselves even
if they normally wouldn't.
Two notes: first, don't get confused because `n
' is the fourteenth
letter of the alphabet; printing `\016
' (fourteen in octal) won't do
you any good. The remedy, after you discover your text is unreadable
(for VT100-like terminals including xterm), is to print `\017
'.
Secondly, those backslashes can land you in real quoting difficulties.
Normally a backslash on the command line escapes the next character ---
this is a different form of escaping to print
's --- so
print \n
doesn't produce a newline, it just prints out an `n
'. So you need to
quote that. This means
print \\
passes a single backslash to quote, and
print \\n
or
print '\n'
prints a newline (followed by the extra one that's usually there). To print a real backslash, you would thus need
print \\\\
Actually, you can get away with the two if there's nothing else after
--- print
just shrugs its shoulders and outputs what it's been given
--- but that's not a good habit to get into. There are other ways of
doing this: since single quotes quote anything, including backslashes
(they are the only way of making backslashes behave like normal
characters), and since the `-r
' option makes print treat characters
normally,
print -r '\'
has the same effect. But you need to remember the two levels of quoting
for backslashes. Quotes aren't special to print
, so
print \'
is good enough for printing a quote.
echotc
There's an oddity called `echotc
', which takes as its argument
`termcap' capabilities. This now lives in its own module,
zsh/termcap
.
Termcap is a now rather old-fashioned way of giving the commands
necessary for performing various standard operations on terminals:
moving the cursor, clearing to the end of the line, turning on standout
mode, and so on. It has now been replaced almost everywhere by
`terminfo', a completely different way of specifying capabilities, and
by `curses', a more advanced system for manipulating objects on a
character terminal. This means that the arguments you need to give to
echotc
can be rather hard to come by; try the termcap
manual page;
if there are two, it's probably the one in section five which gives the
codes, i.e. `man 5 zsh
' or `man -s 5 zsh
' on Solaris. Otherwise
you'll have to search the web. The reason the zsh
manual doesn't give
a list is that the shell only uses a few well-known sequences, and there
are very many others which will work with echotc
, because the
sequences are interpreted by a the terminal, not the shell.
This chunk gives you a flavour:
zmodload -i zsh/termcap
echotc md
echo -n bold
echotc mr
echo -n reverse
echotc me
echo
First we make sure the module is loaded into the shell; on some older
operating systems, this only works if it was compiled in when zsh was
installed. The option -i
to zmodload
stops the shell from
complaining if the module was already loaded. This is a sensible way of
ensuring you have the right facilities available in a shell function,
since loading a module makes it available until it is explicitly
unloaded.
You should see `bold
' in bold characters, and `reverse
' in bold
reverse video. The `md
' capability turns on bold mode; `mr
' turns
on reverse video; `me
' turns off both modes. A more typical zsh way
of doing this is:
print -P '%Bbold%Sreverse%b%s'
which should show the same thing, but using prompt escapes --- prompts
are the most common use of special fonts. The `%S
' is because zsh
calls reverse `standout' mode, because it does. (On a colour xterm, you
may find `bold' is interpreted as `blue'.)
There's a lot more you can do with echotc
if you really try. The shell
has just acquired a way of printing terminfo sequences, predictably
called echoti
, although it's only available on systems where zsh needs
terminfo to compile --- this happens when the termcap code is actually a
part of terminfo. The good news about this is that terminfo tends to be
better documented, so you have a good chance of finding out the
capabilities you want from the terminfo
manual page. The echoti
command lives in another predictably named module, zsh/terminfo
.
3.2.2: Other builtins just for speed
There are only a few other builtins which are there just to make things go faster. Strictly, tests could go into this category, but as I explained in the last chapter it's useful to have tests in the form
if [[ $var1 = $var2 ]]; then
print doing something
fi
be treated as a special syntax by the shell, in case $var1
or $var2
expands to nothing which would otherwise confuse it. This example
consists of two features described below: the test itself, between the
double square brackets, which is true if the two substituted values are
the same string, and the `if
' construct which runs the commands in
the middle (here just the print
) if that test was true.
The builtins `true
' and `false
' do nothing at all, except return a
command status zero or one, respectively. They're just used as
placeholders: to run a loop forever --- while
will also be explained
in more detail later --- you use
while true; do
print doing something over and over
done
since the test always succeeds.
A synonym for `true
' is `:
'; it's often used in this form to give
arguments which have side effects but which shouldn't be used ---
something like
: ${param:=value}
which is a common idiom in all Bourne shell derivatives. In the
parameter expansion, $param
is given the value value
if it was empty
before, and left alone otherwise. Since that was the only reason for the
parameter expansion, you use :
to ignore the argument. Actually, the
shell blithely builds the command line --- the colon, followed by
whatever the value of $param
is, whether or not the assignment
happened --- then executes the command; it just so happens that `:
'
takes no notice of the arguments it was given. If you're switching from
ksh, you may expect certain synonyms like this to be aliases, rather
than builtins themselves, but in zsh they are actually builtins; there
are no aliases predefined by the shell. (You can still get rid of them
using `disable
', as described below.)
3.2.3: Builtins which change the shell's state
A more common use for builtins is that they change something inside the shell, or report information about what's going on in the shell. There is one vital thing to remember about external commands. It applies, too, to other cases we'll meet where the shell `forks', literally splitting itself into two parts, where the forked-off part behaves just like an external command. In both of these cases, the command is in a different process, UNIX's basic unit of things that run. (In fact, even Windows knows about processes nowadays, although they interact a little bit differently with one another.)
The vital thing is that no change in a separate process started by the shell affects the shell itself. The most common case of this is the current directory --- every process has its own current directory. You can see this by starting a new zsh:
% pwd # show the current directory
~
% zsh # start a new shell, which
# is a separate process
% cd tmp
% pwd # now I'm in a different
# directory...
~/tmp
% exit # leave the new shell...
% pwd # now I'm back where I was...
~
Hence the cd
command must be a shell builtin, or this would happen
every time you ran it.
Here's a more useful example. Putting parentheses around a command asks the shell to start a different process for it. That's useful when you specifically don't want the effects propagating back:
(cd some-other-dir; run-some-command)
runs the command, but doesn't change the directory the `real' shell is in, only its forked-off `subshell'. Hence,
% pwd
~
% (cd /; pwd)
/
% pwd
~
There's a more subtle case:
cd some-other-dir | print Hello
Remember, the `|
' (`pipe') connects the output of the first command
to the input of the next --- though actually no information is passed
that way in this example. In zsh, all but the last portion of the
`pipeline' thus created is run in different processes. Hence the cd
doesn't affect the main shell. I'll refer to it as the `parent' shell,
which is the standard UNIX language for processes; when you start
another command or fork off a subshell, you are creating `children'
(without meaning to be morbid, the children usually die first in this
case). Thus, as you would guess,
print Hello | cd some-other-dir
does have the effect of changing the directory. Note that other shells do this differently; it is always guaranteed to work this way in zsh, because many people rely on it for setting parameters, but many shells have the left hand of the pipeline being the bit that runs in the parent shell. If both sides of the pipe symbol are external commands of some sort, both will of course run in subprocesses.
There are other ways you change the state of the shell, for example by declaring parameters of a particular type, or by telling it how to interpret certain commands, or, of course, by changing options. Here are the most useful, grouped in a vaguely logical fashion.
3.2.4: cd and friends
You will not by now be surprised to learn that the `cd
' command
changes directory. There is a synonym, `chdir
', which as far as I
know no-one ever uses. (It's the same name as the system call, so if you
had been programming in C or Perl and forgot that you were now using the
shell, you might use `chdir
'. But that seems a bit far-fetched.)
There are various extra features built into cd
and chdir
. First, if
you miss out the directory to which you want to change, you will be
taken to your home directory, although it's not as if `cd ~
' is all
that hard to type.
Next, the command `cd -
' is special: it takes you to the last
directory you were in. If you do a sequence of cd
commands, only the
immediately preceding directory is remembered; they are not stacked up.
Thirdly, there is a shortcut for changing between similarly named
directories. If you type `cd <old> <new>
', then the shell will look
for the first occurrence of the string `<old>
' in the current
directory, and try to replace it with `<new>
'. For example,
% pwd
~/src/zsh-3.0.8/Src
% cd 0.8 1.9
~/src/zsh-3.1.9/Src
The cd
command actually reported the new directory, as it usually does
if it's not entirely obvious where it's taken you.
Note that only the first match of <old>
is taken. It's an easy
mistake to think you can change from
/home/export1/pws/mydir1/something
to
/home/export1/pws/mydir2/something
with `cd 1 2
', but that first
`1
' messes it up. Arguably the shell could be smarter here. Of
course, `cd r1 r2
' will work in this case.
cd
's friend `pwd
' (print working directory) tells you what the
current working directory is; this information is also available in the
shell parameter $PWD
, which is special and automatically updated when
the directory changes. Later, when you know all about expansion, you
will find that you can do tricks with this to refer to other
directories. For example, ${PWD/old/new}
uses the parameter
substitution mechanism to refer to a different directory with old
replaced by new
--- and this time old
can be a pattern, i.e.
something with wildcard matches in it. So if you are in the
zsh-3.0.8/Src
directory as above and want to copy a file from the
zsh-3.1.9/Src
directory, you have a shorthand:
cp ${PWD/0.8/1.9}/myfile.c .
Symbolic links
Zsh tries to track directories across symbolic links. If you're not
familiar with these, you can think of them as a filename which behaves
like a pointer to another file (a little like Windows' shortcuts, though
UNIX has had them for much longer and they work better). You create them
like this (ln
is not a builtin command, but its use to make symbolic
links is very standard these days):
ln -s existing-file-name name-of-link
for example
ln -s /usr/bin/ln ln
creates a file called ln
in the current directory which does nothing
but point to the file /usr/bin/ln
. Symbolic links are very good at
behaving as much like the original file as you usually want; for
example, you can run the ln
link you've just created as if it were
/usr/bin/ln
. They show up differently in a long file listing with
`ls -l
', the last column showing the file they point to.
You can make them point to any sort of file at all, including directories, and that is why they are mentioned here. Suppose you create a symbolic link from your home directory to the root directory and change into it:
ln -s / ~/mylink
cd ~/mylink
If you don't know it's a link, you expect to be able to change to the
parent directory by doing `cd ..
'. However, the operating system ---
which just has one set of directories starting from /
and going down,
and ignores symbolic links after it has followed them, they really are
just pointers --- thinks you are in the root directory /
. This can be
confusing. Hence zsh tries to keep track of where you probably think
you are, rather than where the system does. If you type `pwd
', you
will see `/home/you/mylink
' (wherever your home directory is), not
`/
'; if you type `cd ..
', you will find yourself back in your home
directory.
You can turn all this second-guessing off by setting the option
CHASE_LINKS
; then `cd ~/mydir; pwd
' will show you to be in /
,
where changing to the parent directory has no effect; the parent of the
root directory is the root directory, except on certain slightly
psychedelic networked file systems. This does have advantages: for
example, `cd ~/mydir; ls ..
' always lists the root directory, not
your home directory, regardless of the option setting, because ls
doesn't know about the links you followed, only zsh does, and it treats
the ..
as referring to the root directory. Having CHASE_LINKS
set
allows `pwd
' to warn you about where the system thinks you are.
An aside for non-UNIX-experts (over 99.9% of the population of the world
at the last count): I said `symbolic links' instead of just `links'
because there are others called `hard links'. This is what `ln
'
creates if you don't use the -s
option. A hard link is not so much a
pointer to a file as an alternative name for a file. If you do
ln myfile othername
ls -l
where myfile
already exists you can't tell which of myfile
and
othername
is the original --- and in fact the system doesn't care. You
can remove either, and the other will be perfectly happy as the name for
the file. This is pretty much how renaming files works, except that
creating the hard link is done for you in that case. Hard links have
limitations --- you can't link to directories, or to a file on another
disk partition (and if you don't know what a disk partition is, you'll
see what a limitation that can be). Furthermore, you usually want to
know which is the original and which is the link --- so for most users,
creating symbolic links is more useful. The only drawback is that
following the pointers is a tiny bit slower; if you think you can notice
the difference, you definitely ought to slow down a bit.
The target of a symbolic link, unlike a hard link, doesn't actually have
to exist and no checking is performed until you try to use the link. The
best thing to do is to run `ls -lL
' when you create the link; the
-L
part tells ls
to follow links, and if it worked you should see
that your link is shown as having exactly the same characteristics as
the file it points to. If it is still shown as a link, there was no such
file.
While I'm at it, I should point out one slight oddity with symbolic
links: the name of the file linked to (the first name), if it is not an
absolute path (beginning with /
after any ~
expansion), is treated
relative to the directory where the link is created --- not the current
directory when you run ln
. Here:
ln -s ../mydir ~/links/otherdir
the link otherdir
will refer to mydir
in its own parent directory,
i.e. ~/links
--- not, as you might think, the parent of the directory
where you were when you ran the command. What makes it worse is that the
second word, if is not an absolute path, is interpreted relative to
the directory where you ran the command.
$cdpath and AUTO_CD
We're nowhere near the end of the magic you can do with directories yet
(and, in fact, I haven't even got to the zsh-specific parts). The next
trick is $cdpath
and $CDPATH
. They look a lot like $path
and
$PATH
which you met in the last chapter, and I mentioned them briefly
back in the last chapter in that context: $cdpath
is an array of
directories, while $CDPATH
is colon-separated list behaving otherwise
like a scalar variable. They give a list of directories whose
subdirectories you may want to change into. If you use a normal cd
command (i.e. in the form `cd
dirname', and dirname does not
begin with a /
or ~
, the shell will look through the directories in
$cdpath
to find one which contains the subdirectory dirname. If
$cdpath
isn't set, as you'd guess, it just uses the current directory.
Note that $cdpath
is always searched in order, and you can put a .
in it to represent the current directory. If you do, the current
directory will always be searched at that point, not necessarily
first, which may not be what you expect. For example, let's set up some
directories:
mkdir ~/crick ~/crick/dna
mkdir ~/watson ~/watson/dna
cdpath=(~/crick .)
cd ~/watson
cd dna
So I've moved to the directory ~/watson
, which contains the
subdirectory dna
, and done `cd dna
'. But because of $cdpath
, the
shell will look first in ~/crick
, and find the dna
there, and take
you to that copy of the self-reproducing directory, not the one in
~/watson
. Most people have .
at the start of their cdpath
for that
reason. However, at least cd
warns you --- if you tried it, you will
see that it prints the name of the directory it's picked in cases like
this.
In fact, if you don't have .
in your directory at all, the shell will
always look there first; there's no way of making cd
never change to a
subdirectory of the current one, short of turning cd
into a function.
Some shells don't do this; they use the directories in $cdpath
, and
only those.
There's yet another shorthand, this time specific to zsh: the option
AUTO_CD
which I mentioned in the last chapter. That way a command
without any arguments which is really a directory will take you to that
directory. Normally that's perfect --- you would just get a `command
not found' message otherwise, and you might as well make use of the
option. Just occasionally, however, the name of a directory clashes with
the name of a command, builtin or external, or a shell function, and
then there can be some confusion: zsh will always pick the command as
long as it knows about it, but there are cases where it doesn't, as I
described above.
What I didn't say in the last chapter is that AUTO_CD
respects
$cdpath
; in fact, it really is implemented so that `dirname' on its
own behaves as much like `cd
dirname' as is possible without tying
the shell's insides into knots.
The directory stack
One very useful facility that zsh inherited from the C-shell family
(traditional Korn shell doesn't have it) is the directory stack. This is
a list of directories you have recently been in. If you use the command
`pushd
' instead of `cd
', e.g. `pushd
dirname', then the
directory you are in is saved in this list, and you are taken to
dirname, using $CDPATH
just as cd
does. Then when you type
`popd
', you are taken back to where you were. The list can be as long
as you like; you can pushd
any number of directories, and each popd
will take you back through the list (this is how a `stack', or more
precisely a `last-in-first-out' stack usually operates in computer
jargon, hence the name `directory stack').
You can see the list --- which always starts with the current directory
--- with the dirs
command. So, for example:
cd ~
pushd ~/src
pushd ~/zsh
dirs
displays
~/zsh ~/src ~
and the next popd
will take you back to ~/src
. If you do it, you
will see that pushd
reports the list given by dirs
automatically as
it goes along; you can turn this off with the option PUSHD_SILENT
,
when you will have to rely on typing dirs
explicitly.
In fact, a lot of the use of this comes not from using simple pushd
and popd
combinations, but from two other features. First, `pushd
'
on its own swaps the top two directories on the stack. Second, pushd
with a numeric argument preceded by a `+
' or `-
' can take you to
one of the other directories in the list. The command `dirs -v
' tells
you the numbers you need; 0
is the current directory. So if you get,
0 ~/zsh
1 ~/src
2 ~
then `pushd +2
' takes you to ~
. (A little suspension of disbelief
that I didn't just use AUTO_CD
and type `..
' is required here.) If
you use a -
, it counts from the other end of the list; -0
(with
apologies to the numerate) is the last item, i.e. the same as ~
in
this case. Some people are used to having the `-
' and `+
'
arguments behave the other way around; the option PUSHD_MINUS
exists
for this.
Apart from PUSHD_SILENT
and PUSHD_MINUS
, there are a few other
relevant options. Setting PUSHD_IGNORE_DUPS
means that if you pushd
to a directory which is already somewhere in the list, the duplicate
entry will be silently removed. This is useful for most human operations
--- however, if you are using pushd
in a function or script to
remember previous directories for a future matching popd
, this can be
dangerous and you probably want to turn it off locally inside the
function.
AUTO_PUSHD
means that any directory-changing command, including an
auto-cd, is treated as a pushd
command with the target directory as
argument. Using this can make the directory stack get very long, and
there is a parameter $DIRSTACKSIZE
which you can set to specify a
maximum length. The oldest entry (the highest number in the `dirs -v
'
listing) is automatically removed when this length is exceeded. There is
no limit unless this is explicitly set.
The final pushd
option is PUSHD_TO_HOME
. This makes pushd
on its
own behave like cd
on its own in that it takes you to your home
directory, instead of swapping the top two directories. Normally a
series of `pushd
' commands works pretty much like a series of `cd -
' commands, always taking you the directory you were in before, with
the obvious difference that `cd -
' doesn't consult the directory
stack, it just remembers the previous directory automatically, and hence
it can confuse pushd
if you just use `cd -
' instead.
There's one remaining subtlety with pushd
, and that is what happens to
the rest of the list when you bring a particular directory to the front
with something like `pushd +2
'. Normally the list is simply cycled,
so the directories which were +3, and +4 are now right behind the new
head of the list, while the two directories which were ahead of it get
moved to the end. If the list before was:
dir1 dir2 dir3 dir4
then after pushd +2
you get
dir3 dir4 dir1 dir2
That behaviour changed during the lifetime of zsh, and some of us preferred the old behaviour, where that one directory was yanked to the front and the rest just closed the gap:
# Old behaviour
dir3 dir1 dir2 dir4
so that after a while you get a `greatest hits' group at the front of
the list. If you like this behaviour too (I feel as if I'd need to have
written papers on group theory to like the new behaviour) there is a
function pushd
supplied with the source code, although it's short
enough to repeat here --- this is in the form for autoloading in the zsh
fashion:
# pushd function to emulate the old zsh behaviour.
# With this, pushd +/-n lifts the selected element
# to the top of the stack instead of cycling
# the stack.
emulate -R zsh
setopt localoptions
if [[ ARGC -eq 1 && "$1" == [+-]<-> ]] then
setopt pushdignoredups
builtin pushd ~$1
else
builtin pushd "$@"
fi
The `&&
' is a logical `and', requiring both tests to be true. The
tests are that there is exactly one argument to the function, and that
it has the form of a `+
' or a `-
' followed by any number (`<->
'
is a special zsh pattern to match any number, an extension of forms like
`<1-100>
' which matches any number in the range 1 to 100 inclusive).
Referring to other directories
Zsh has two ways of allowing you to refer to particular directories.
They have in common that they begin with a ~
(in very old versions of
zsh, the second form actually used an `=
', but the current way is
much more logical).
You will certainly be aware, because I've made a lot of use of it, that
a `~
' on its own or followed by a /
refers to your own home
directory. An extension of this --- again from the C-shell, although the
Korn shell has it too in this case --- is that ~name
can refer to the
home directory of any user on the system. So if your user name is pws
,
then ~
and ~pws
are the same directory.
Zsh has an extension to this; you can actually name your own directories. This was described in chapter 2, à propos of prompts, since that is the major use:
host% PS1='%~? '
~? cd zsh/Src
~/zsh/Src? zsrc=$PWD
~/zsh/Src? echo ~zsrc
/home/pws/zsh/Src
~zsrc?
Consult that chapter for the ways of forcing a parameter to be recognised as a named directory.
There's a slightly more sophisticated way of doing this directly:
hash -d zsrc=~/zsh/Src
makes ~zsrc
appear in prompts as before, and in this case there is no
parameter $zsrc
. This is the purist's way (although very few zsh users
are purists). You can guess what `unhash -d zsrc
' does; this works
with directories named via parameters, too, but leaves the parameter
itself alone.
It's possible to have a named directory with the same name as a user. In
that case `~name
' refers to the directory you named explicitly, and
there is no easy way of getting name
's home directory without removing
the name you defined.
If you're using named directories with one of the cd
-like commands or
AUTO_CD
, you can set the option CDABLEVARS
which allows you to omit
the leading ~
; `cd zsrc
' with this option would take you to
~zsrc
. The name is a historical artifact and now a misnomer; it really
is named directories, not parameters (i.e. variables), which are used.
The second way of referring to directories with ~
's is to use numbers
instead of names: the numbers refer to directories in the directory
stack. So if dirs -v
gives you
0 ~zsf
1 ~src
then ~+1
and ~-0
(not very mathematical, but quite logical if you
think about it) refer to ~src
. In this case, unlike pushd arguments,
you can omit the +
and use ~1
. The option PUSHD_MINUS
is
respected. You'll see this was used in the pushd
function above: the
trick was that ~+3
, for example, refers to the same element as pushd +3
, hence pushd ~+3
pushed that directory onto the front of the list.
However, we set PUSHD_IGNORE_DUPS
, so that the value in the old
position was removed as well, giving us the effect we wanted of simply
yanking the directory to the front with no trick cycling.
3.2.5: Command control and information commands
Various builtins exist which control how you access commands, and which show you information about the commands which can be run.
The first two are strictly speaking `precommand modifiers' rather than
commands: that means that they go before a command line and modify its
behaviour, rather than being commands in their own right. If you put
`command
' in front of a command line, the command word (the next one
along) will be taken as the name of an external command, however it
would normally be interpreted; likewise, if you put `builtin
' in
front, the shell will try to run the command as a builtin command.
Normally, shell functions take precedence over builtins which take
precedence over external commands. So, for example, if your printer
control system has the command `enable
' (as many System V versions
do), which clashes with a builtin I am about to talk about, you can run
`command enable lp
' to enable a printer; otherwise, the builtin
enable would have been run. Likewise, if you have defined cd
to be a
function, but this time want to call the normal builtin cd
, you can
say `builtin cd mydir
'.
A common use for command
is inside a shell function of the same name.
Sometimes you want to enhance an ordinary command by sticking some extra
stuff around it, then calling that command, so you write a shell
function of the same name. To call the command itself inside the shell
function, you use `command
'. The following works, although it's
obviously not all that useful as it stands:
ls() {
command ls "$[@]"
}
so when you run `ls
', it calls the function, which calls the real
ls
command, passing on the arguments you gave it.
You can gain longer lasting control over the commands which the shell
will run with the `disable
' and `enable
' commands. The first
normally takes builtin arguments; each such builtin will not be
recognised by the shell until you give an `enable
' command for it. So
if you want to be able to run the external enable
command and don't
particularly care about the builtin version, `disable enable
' (sorry
if that's confusing) will do the trick. Ha, you're thinking, you can't
run `enable enable
'. That's correct: some time in the dim and distant
past, builtin enable enable
' would have worked, but currently it
doesn't; this may change, if I remember to change it. You can list all
disabled builtins with just `disable
' on its own --- most of the
builtins that do this sort of manipulation work like that.
You can manipulate other sets of commands with disable
and enable
by
giving different options: aliases with the option -a
, functions with
-f
, and reserved words with -r
. The first two you probably know
about, and I'll come to them anyway, but `reserved words' need
describing. They are essentially builtin commands which have some
special syntactic meaning to the shell, including some symbols such as
`{
' and `[[
'. They take precedence over everything else except
aliases --- in fact, since they're syntactically special, the shell
needs to know very early on that it has found a reserved word, it's no
use just waiting until it tries to execute a command. For example, if
the shell finds `[[
' it needs to know that everything until `]]
'
must be treated as a test rather than as ordinary command arguments.
Consequently, you wouldn't often want to disable a reserved word, since
the shell wouldn't work properly. The most obvious reason why you might
would be for compatibility with some other shell which didn't have one.
You can get a complete list with:
whence -wm '*' | grep reserved
which I'll explain below, since I'm coming to `whence
'.
Furthermore, I tend to find that if I want to get rid of aliases or
functions I use the commands `unalias
' and `unfunction
' to get rid
of them permanently, since I always have the original definitions stored
somewhere, so these two options may not be that useful either. Disabling
builtins is definitely the most useful of the four possibilities for
disable
.
External commands have to be manipulated differently. The types given
above are handled internally by the shell, so all it needs to do is
remember what code to call. With external commands, the issue instead is
how to find them. I mentioned rehash
above, but didn't tell you that
the hash
command, which you've already seen with the -d
option, can
be used to tell the shell how to find an external command:
hash foo=/path/to/foo
makes foo
execute the command using the path shown (which doesn't even
have to end in `foo
'). This is rather like an alias --- most people
would probably do this with an alias, in fact --- although a little
faster, though you're unlikely to notice the difference. You can remove
this with unhash
. One gotcha here is that if the path is rehashed,
either by calling rehash
or when you alter $path
, the entire hash
table is emptied, including anything you put in in this way; so it's not
particularly useful.
In the midst of all this, it's useful to be able to find out what the
shell thinks a particular command name does. The command `whence
'
tells you this; it also exists, with slightly different options, under
the names where
, which
and type
, largely to provide compatibility
with other shells. I'll just stick to whence
.
Its standard output isn't actually sparklingly interesting. If it's a command somehow known to the shell internally, it gets echoed back, with the alias expanded if it was an alias; if it's an external command it's printed with the full path, showing where it came from; and if it's not known the command returns status 1 and prints nothing.
You can make it more useful with the -v
or -c
options, which are
more verbose; the first prints out an information message, while the
second prints out the definitions of any functions it was asked about
(this is also the effect of using `which
' instead of `whence
). A
very useful option is -m
, which takes any arguments as patterns using
the usual zsh pattern format, in other words the same one used for
matching files. Thus
whence -vm "*"
prints out every command the shell knows about, together with what it thinks of it.
Note the quotes around the `*
' --- you have to remember these
anywhere where the pattern is not to be used to generate filenames on
the command line, but instead needs to be passed to the command to be
interpreted. If this seems a rather subtle distinction, think about what
would happen if you ran
# Oops. Better not try this at home.
# (Even better, don't do it at work either.)
whence -vm *
in a directory with the files `foo
' and (guess what) `bar
' in it.
The shell hasn't decided what command it's going to run when it first
looks at the command line; it just sees the `*
' and expands the line
to
whence -vm foo bar
which isn't what you meant.
There are a couple of other tricks worth mentioning: -p
makes the
shell search your path for them, even if the name is matched as
something else (say, a shell function). So if you have ls
defined as a
function,
which -p ls
will still tell what `command ls
' would find. Also, the option -a
searches for all commands; in the same example, this would show you both
the ls
command and the ls
function, whereas whence
would normally
only show the function because that's the one that would be run. The
-a
option also shows if it finds more than one external command in
your path.
Finally, the option -w
is useful because it identifies the type of a
command with a single word: alias
, builtin
, command
, function
,
hashed
, reserved
or none
. Most of those are obvious, with
command
being an ordinary external command; hashed
is an external
command which has been explicitly given a path with the hash
builtin,
and none
means it wasn't recognised as a command at all. Now you know
how we extracted the reserved words above.
A close relative of whence
is functions
, which applies, of course,
to shell functions; it usually lists the definitions of all functions
given as arguments, but its relatives (of which autoload
is one)
perform various other tricks, to be described in the section on shell
functions below. Be careful with function
, without the `s', which is
completely different and not like command
or builtin
--- it is
actually a keyword used to define a function.
3.2.6: Parameter control
There are various builtins for controlling the shells parameters. You already know how to set and use parameters, but it's a good deal more complicated than that when you look at the details.
Local parameters
The principal command for manipulating the behaviour of parameters is
`typeset
'. Its easiest usage is to declare a parameter; you just give
it a list of parameter names, which are created as scalar parameters.
You can create parameters just by assigning to them, but the major point
of `typeset
' is that if a parameter is created that way inside a
function, the parameter is restored to its original value, or removed if
it didn't previously exist, at the end of the function --- in other
words, it has `local scope' like the variables which you declare in
most ordinary programming languages. In fact, to use the jargon it has
`dynamical' rather than `syntactic' scope, which means that the same
parameter is visible in any function called within the current one; this
is different from, say, C or FORTRAN where any function or subroutine
called wouldn't see any variable declared in the parent function.
The following makes this more concrete.
var='Original value'
subfn() {
print $var
}
fn() {
print $var
typeset var='Value in function'
print $var
subfn
}
fn
print $var
This chunk of code prints out
Original value
Value in function
Value in function
Original value
The first three chunks of the code just define the parameter $var
, and
two functions, subfn
and fn
. Then we call fn
. The first thing this
does is print out $var
, which gives `Original value
' since we
haven't changed the original definition. However, the typeset
next
does that; as you see, we can assign to the parameter during the
typeset. Thus when we print $var
out again, we get `Value in function
'. Then subfn
is called, which prints out the same value as
in fn
, because we haven't changed it --- this is where C or FORTRAN
would differ, and wouldn't recognise the variable because it hadn't been
declared in that function. Finally, fn
exits and the original value is
restored, and is printed out by the final `print
'.
Note the value changes twice: first at the typeset
, then again at the
end of fn
. The value of $var
at any point will be one of those two
values.
Although you can do assignments in a typeset
statement, you can't
assign to arrays (I already said this in the last chapter):
typeset var=(Doesn\'t work\!)
because the syntax with the parentheses is special; it only works when the line consists of nothing but assignments. However, the shell doesn't complain if you try to assign an array to a scalar, or vice versa; it just silently converts the type:
typeset var='scalar value'
var=(array value)
I put in the assignment in the typeset statement to rub the point in that it creates scalars, but actually the usual way of setting up an array in a function is
typeset var
var=()
which creates an empty scalar, then converts that to an empty array.
Recent versions of the shell have `typeset -a var
' to do that in one
go --- but you still can't assign to it in the same statement.
There are other catches associated with the fact that typeset
and its
relatives are just ordinary commands with ordinary sets of arguments.
Consider this:
% typeset var=`echo two words`
% print $var
two
What has happened to the `words
'? The answer is that backquote
substitution, to be discussed below, splits words when not quoted. So
the typeset
statement is equivalent to
% typeset var=two words
There are two ways to get round this; first, use an ordinary assignment:
% typeset var
% var=`echo two words`
which can tell a scalar assignment, and hence knows not to split words, or quote the backquotes,
% typeset var="`echo two words`"
There are three important types we haven't talked about; both of these
can only be created with typeset
or one of the similar builtins I'll
list in a moment. They are integer types, floating point types, and
associative array types.
Numeric parameters
Integers are created with `typeset -i
', or `integer
' which is
another way of saying the same thing. They are used for arithmetic,
which the shell can do as follows:
integer i
(( i = 3 * 2 + 1 ))
The double parentheses surround a complete arithmetic expression: it
behaves as if it's quoted. The expression inside can be pretty much
anything you might be used to from arithmetic in other programming
languages. One important point to note is that parameters don't need to
have the $
in front, even when their value is being taken:
integer i j=12
(( i = 3 * ( j + 4 ) ** 2 ))
Here, j
will be replaced by 12 and $i
gets the value 768 (sixteen
squared times three). One thing you might not recognise is the **
,
which is the `to the power of' operator which occurs in FORTRAN and
Perl. Note that it's fine to have parentheses inside the double
parentheses --- indeed, you can even do
(( i = (3 * ( j + 4 )) ** 2 ))
and the shell won't get confused because it knows that any parentheses inside must be in balanced pairs (until you deliberately confuse it with your buggy code).
You would normally use `print $i
' to see what value had been given to
$i
, of course, and as you would expect it gets printed out as a
decimal number. However, typeset
allows you to specify another base
for printing out. If you do
typeset -i 16 i
print $i
after the last calculation, you should see 16#900
, which means 900 in
base 16 (hexadecimal). That's the only effect the option `-i 16
' has
on $i
--- you can assign to it and use it in arithmetical expressions
just as normal, but when you print it out it appears in this form. You
can use this base notation for inputting numbers, too:
(( i = 16#ff * 2#10 ))
which means 255 (ff
in hexadecimal) times 2 (10
in binary). The
shell understands C notation too, so `16#ff
' could have been
expressed `0xff
'.
Floating point variables are very similar. You can declare them with
`typeset -F
' or `typeset -E
'. The only difference between the two
is, again, on output; -F
uses a fixed point notation, while -E
uses
scientific (mnemonic: exponential) notation. The builtin `float
' is
equivalent to `typeset -E
' (because Korn shell does it, that's why).
Floating point expressions also work the way you are probably used to:
typeset -E e
typeset -F f
(( e = 32/3, f = 32.0/3.0 ))
print $e $f
prints
1.000000000e+01 10.6666666667
Various points: the `,
' can separate different expressions, just like
in C, so the e
and f
assignments are performed separately. The e
assignment was actually an integer division, because neither 32 nor 3 is
a floating point number, which must contain a dot. That means an integer
division was done, producing 10, which was then converted to a floating
point number only at the end. Again, this is just how grown-up languages
work, so it's no use cursing. The f
assignment was a full floating
point performance. Floating point parameters weren't available before
version 3.1.7
.
Although this is really a matter for a later chapter, there is a library
of floating point functions you can load (actually it's just a way of
linking in the system mathematical library). The usual incantation is
`zmodload zsh/mathfunc
'; you may not have `dynamic loading' of
libraries on your system, which may mean that doesn't work. If it does,
you can do things like
(( pi = 4.0 * atan(1.0) ))
Broadly, all the functions which appear in most system mathematical
libraries (see the manual page for math
) are available in zsh.
Like all other parameters created with typeset
or one of its cousins,
integer and floating point parameters are local to functions. You may
wonder how to create a global parameter (i.e. one which is valid outside
as well as inside the function) which has an integer or floating point
value. There's a recent addition to the shell (in version 3.1.6) which
allows this: use the flag -g
to typeset along with any others. For
example,
fn() {
typeset -Fg f
(( f = 42.75 ))
}
fn
print $f
If you try it, you will see the value of $f
has survived beyond the
function. The g
stands for global, obviously, although it's not quite
that simple:
fn() {
typeset -Fg f
}
outerfn() {
typeset f='scalar value'
fn
print $f
}
outerfn
The function outerfn
creates a local scalar value for f
; that's what
fn
sees. So it was not really operating on a `global' value, it just
didn't create a new one for the scope of fn
. The error message comes
because it tried to preserve the value of $f
while changing its type,
and the value wasn't a proper floating point expression. The error
message,
fn: bad math expression: operator expected at `value'
comes about because assigning to numeric parameters always does an
arithmetic evaluation. Operating on `scalar value
' it found
`scalar
' and assumed this was a parameter, then looked for an
operator like `+
' to come next; instead it found `value
'. If you
want to experiment, change the string to `scalar + value
' and set
`value=42
', or whatever, then try again. This is a little confusing
(which is a roundabout way of saying it confused me), but consistent
with how zsh usually treats parameters.
Actually, to a certain extent you don't need to use the integer and floating point parameters. Any time zsh needs a numeric expression it will force a scalar to the right value, and any time it produces a numeric expression and assigns it to a scalar, it will convert the result to a string. So
typeset num=3 # This is the *string* `3'.
(( num = num + 1 )) # But this works anyway
# ($num is still a string).
This can be useful if you have a parameter which is sometimes a number, sometimes a string, since zsh does all the conversion work for you. However, it can also be confusing if you always want a number, because zsh can't guess that for you; plus it's a little more efficient not to have to convert back and forth; plus you lose accuracy when you do, because if the number is stored as a string rather than in the internal numeric representation, what you say is what you get (although zsh tends to give you quite a lot of decimal places when converting implicitly to strings). Anyway, I'd recommend that if you know a parameter has to be an integer or floating point value you should declare it as such.
There is a builtin called let
to handle mathematical expressions, but
since
let "num = num + 1"
is equivalent to
(( num = num + 1 ))
and the second form is easier and more memorable, you probably won't need to use it. If you do, remember that (unlike BASIC) each mathematical expression should appear as one argument in quotes.
Associative arrays
The one remaining major type of parameter is the associative array; if you use Perl, you may call it a `hash', but we tend not to since that's really a description of how it's implemented rather than what it does. (All right, what it does is hash things. Now shut up.)
These have to be declared by a typeset statement --- there's no getting round it. There are some quite eclectic builtins that produce a filled-in associative array for you, but the only way to tell zsh you want your very own associative array is
typeset -A assoc
to create $assoc
. As to what it does, that's best shown by example:
typeset -A assoc
assoc=(one eins two zwei three drei)
print ${assoc[two]}
which prints `zwei
'. So it works a bit like an ordinary array, but
the numeric subscript of an ordinary array which would have appeared
inside the square bracket is replaced by the string key, in this case
two
. The array assignment was a bit deceptive; the `values' were
actually pairs, with `one
' being the key for the value `eins
', and
so on. The shell will complain if there are an odd number of elements in
such a list. This may also be familiar from Perl. You can assign values
one at a time:
assoc[four]=vier
and also unset one key/value pair:
unset 'assoc[one]'
where the quotes stop the square brackets from being interpreted as a pattern on the command line.
Expansion has been held over, but you might like to know about the ways of getting back what you put in. If you do
print $assoc
you just see the values --- that's exactly the same as with an ordinary array, where the subscripts 1, 2, 3, etc. aren't shown. Note they are in random order --- that's the other main difference from ordinary arrays; associative arrays have no notion of an order unless you explicitly sort them.
But here the keys may be just as interesting. So there is:
print ${(k)assoc}
print ${(kv)assoc}
giving (if you've followed through all the commands above):
four two three
four vier two zwei three drei
which print out the keys instead of the values, and the key and value pairs much as you entered them. You can see that, although the order of the pairs isn't obvious, it's the same each time. From this example you can work out how to copy an associative array into another one:
typeset -A newass
newass=(${(kv)assoc})
where the `(kv)
' is important --- as is the typeset
just before the
assignment, otherwise $newass
would be a badass ordinary array. You
can also prove that ${(v)assoc}
does what you would probably expect.
There are lots of other tricks, but they are mostly associated with
clever types of parameter expansion, to be described in chapter
5.
Other typeset and type tricks
There are variants of typeset
, some mentioned sporadically above.
There is nothing you can do with any of them that you can't do with
typeset
--- that wasn't always the case; we've tried to improve the
orthogonality of the options. They differ in the options which are set
by default, and the additional options which are allowed. Here's a list:
declare
, export
, float
, integer
, local
, readonly
. I won't
confuse you by describing all in detail; see the manual.
If there is an odd one out, it's export
, which not only marks a
parameter for export but has the -g
flag turned on by default, so that
that parameter is not local to the function; in other words, it's
equivalent to typeset -gx
. However, one holdover from the days when
the options weren't quite so logical is that typeset -x
behaves like
export
, in other words the -g
flag is turned on by default. You can
fix this by unsetting the option GLOBAL_EXPORT
--- the option only
exists for compatibility; logically it should always be unset. This is
partly because in the old days you couldn't export local parameters, so
typeset -x
either had to turn on -g
or turn off -x
; that was fixed
for the 3.1.9 release, and (for example) `local -x
' creates a local
parameter which is exported to the environment; both the parameter
itself, and the value in the environment, will be restored when the
function exits. The builtin local
is essentially a form of typeset
which renounces the -g
flag and all its works.
Another old restriction which has gone is that you couldn't make special
parameters, in particular $PATH
, local to a function; you just
modified the original parameter. Now if you say `typeset PATH
',
things happen the way you probably expect, with $PATH
having its usual
effect, and being restored to its old value when the function exits.
Since $PATH
is still special, though, you should make sure you assign
something to it in the function before calling external commands, else
it will be empty and no commands will be found. It's possible that you
specifically don't want some parameter you make local to have the
special property; 3.1.7 and after allow the typeset flag -h
to hide
the specialness for that parameter, so in `typeset -h PATH
', PATH
would be an ordinary variable for the duration of the enclosing
function. Internally, the same value as was previously set would
continue to be used for finding commands, but it wouldn't be exported.
The second main use of typeset
is to set attributes for the
parameters. In this case it can operate on an existing parameter, as
well as creating a new one. For example,
typeset -r msg='This is an important message.'
sets the readonly flag (-r) for the parameter msg
. If the parameter
didn't exist, it would be created with the usual scoping rules; but if
it did exist at the current level of scoping, it would be made readonly
with the value assigned to it, meaning you can't set that particular
copy of the parameter. For obvious reasons, it's normal to assign a
value to a readonly parameter when you first declare it. Here's a
reality check on how this affects scoping:
msg='This is an ordinary parameter'
fn() {
typeset msg='This is a local ordinary parameter'
print $msg
typeset -r msg='This is a local readonly parameter'
print $msg
msg='Watch me cause an error.'
}
fn
print $msg
msg='This version of the parameter'\
' can still be overwritten'
print $msg
outputs
This is a local ordinary parameter
This is a local readonly parameter
fn:5: read-only variable: msg
This is an ordinary parameter
This version of the parameter can still be overwritten
Unfortunately there was a bug with this code until recently --- thirty
seconds ago, actually: the second typeset
in fn
incorrectly added
the readonly flag to the existing msg
before attempting to set the
new value, which was wrong and inconsistent with what happens if you
create a new local parameter. Maybe it's reassuring that the shell can
get confused about local parameters, too. (I don't find it reassuring in
the slightest, since typeset
is one of the parts of the code where I
tend to fix the bugs, but maybe you do.)
Anyway, when the bug is fixed, you should get the output shown, because
the first typeset created a local variable which the second typeset made
readonly, so that the final assignment caused an error. Then the $msg
in the function went out of scope, and the ordinary parameter, with no
readonly restriction, was visible again.
I mentioned another special typeset option in the previous chapter:
typeset -T TEXINPUTS texinputs
to tie together the scalar $TEXINPUTS
and the array $texinputs
in
the same way that $PATH
and $path
work. This is a one-off; it's the
only time typeset
takes exactly two parameter names on the command
line. All other uses of typeset take a list of parameters to which any
flags given are applied. See the manual for the remaining flags,
although most of the more interesting ones have been discussed.
The other thing you need to know about flags is that you use them with a
`+
' sign to turn off the corresponding attribute. So
typeset +r msg
allows you to set $msg
again. From version 4.1
, you won't be able to
turn off the readonly attribute for a special parameter; that's because
there's too much scope for confusion, including attempting to set
constant strings in the code. For example, `$ZSH_VERSION
' always
prints a fixed string; attempting to change that is futile.
The final use of typeset is to list parameters. If you type `typeset
'
on its own, you get a complete list of parameters and their values. From
3.1.7, you can turn on the flag -H
for a parameter, which means to
hide its value while you're doing this. This can be useful for some of
the more enormous parameters, particularly special parameters which I'll
talk about in the section in chapter 7 on
modules, which tend to swamp the display typeset
produces.
You can also list parameters of a particular type, by listing the flags you want to know about. For example,
typeset -r
lists all readonly parameters. You might expect `typeset +r
' to list
parameters which don't have that attribute, but actually it lists the
same parameters but without showing their value. `typeset +
' lists
all parameters in this way.
Another good way of finding out about parameters is to use the special
expansion `${(t)
param}
', for example
print ${(t)PATH}
prints `scalar-export-special
': $PATH
is a scalar parameter, with
the -x
flag set, and has a special meaning to the shell. Actually,
`special
' means something a bit more than that: it means the internal
code to get and set the parameter behaves in a way which has side
effects, either to the parameter itself or elsewhere in the shell. There
are other parameters, like $HISTFILE
, which are used by the shell, but
which are get and set in a normal way --- they are only special in that
the value is looked at by the shell; and, after all, any old shell
function can do that, too. Contrast this with $PATH
which has all that
paraphernalia to do with hashing commands to take care of when it's set,
as I discussed above, and I hope you'll see the difference.
Reading into parameters
The `read
' builtin, as its name suggests, is the opposite to
`print
' (there's no `write
' command in the shell, though there is
often an external command of that name to send a message to another
user), but reading, unlike printing, requires something in the shell to
change to take the value, so unlike print
, read
is forced to be a
builtin. Inevitably, the values are read into a parameter. Normally they
are taken from standard input, very often the terminal (even if you're
running a script, unless you redirected the input). So the simplest case
is just
read param
and if you type a line, and hit return, it will be put into $param
,
without the final newline.
The read
builtin actually does a bit of processing on the input. It
will usually strip any initial or final whitespace (spaces or tabs) from
the line read in, though any in the middle are kept. You can read a set
of values separated by whitespace just by listing the parameters to
assign them to; the last parameter gets all the remainder of the line
without it being split. Very often it's easiest just to read into an
array:
% read -A array
this is a line typed in now, \
by me, in this space
% print ${array[1]} ${array[12]}
this space
(I'm assuming you're using the native zsh array format, rather than the
one set with KSH_ARRAYS
, and shall continue to assume this.)
It's useful to be able to print a prompt when you want to read
something. You can do this with `print -n
', but there's a shorthand:
% read line'?Please enter a line: '
Please enter a line: some words
% print $line
some words
Note the quotes surround the `?
' to prevent it being taken as part of
a pattern on the command line. You can quote the whole expression from
the beginning of `line
', if you like; I just write it like that
because I know parameter names don't need quoting, because they can't
have funny characters in. It's almost logical.
Another useful trick with read
is to read a single character; the
`-k
' option does this, and in fact you can stick a number immediately
after the `k
' which specifies a number to read. Even easier, the
`-q
' option reads a single character and returns status 0 if it was
y
or Y
, and status 1 otherwise; thus you can read the answer to
yes/no questions without using a parameter at all. Note, however, that
if you don't supply a parameter, the reply gets assigned in any case to
$REPLY
if it's a scalar --- as it is with -q
--- or $reply
if it's
an array --- i.e. if you specify -A
, but no parameter name. These are
more examples of the non-special parameters which the shell uses --- it
sets $REPLY
or $reply
, but only in the same way you would set them;
there are no side-effects.
Like print
, read
has a -r
flag for raw mode. However, this just
has one effect for read
: without it, a \
at the end of the line
specifies that the next line is a continuation of the current one (you
can do this when you're typing at the terminal). With it, \
is not
treated specially.
Finally, a more sophisticated note about word-splitting. I said that,
when you are reading to many parameters or an array, the word is split
on whitespace. In fact the shell splits words on any of the characters
found in the (genuinely special, because it affects the shell's guts)
parameter $IFS
, which stands for `input field separator'. By default
--- and in the vast majority of uses --- it contains space, tab, newline
and a null character (character zero: if you know that these are usually
used to mark the end of strings, you might be surprised the shell
handles these as ordinary characters, but it does, although printing
them out usually doesn't show anything). However, you can set it to any
string: enter
fn() {
local IFS=:
read -A array
print -l $array
}
fn
and type
one word:two words:three words:four
The shell will show you what's in the array it's read, one `word' per line:
one word
two words
three words
four
You'll see the bananas, er, words (joke for the over-thirties) have been
treated as separated by a colon, not by whitespace. Making $IFS
local
didn't work in old versions of zsh, as with other specials; you had to
save it and restore it.
The read
command in zsh doesn't let you do line editing, which some
shells do. For that, you should use the vared
command, which runs the
line editor to edit a parameter, with the -c
option, which allows
vared
to create a new parameter. It also takes the option -p
to
specify a prompt, so one of the examples above can be rewritten
vared -c -p 'Please enter a line: ' line
which works rather like read but with full editing support. If you give
the option -h
(history), you can even retrieve values from previous
command lines. It doesn't have all the formatting options of read,
however, although when reading an array (use the option -a
with -c
if creating a new array) it will perform splitting.
Other builtins to control parameters
The remaining builtins which handle parameters can be dealt with more swiftly.
The builtin set
simply sets the special parameter which is passed as
an argument to functions or scripts, and which you access as $*
or
$@
, or $<number>
(Bourne-like format), or via $argv
(csh-like
format), known however you set them as the `positional parameters':
% set a whole load of words
% print $1
a
% print $*
a whole load of words
% print $argv[2,-2]
whole load of
It's exactly as if you were in a function and had called the function
with the arguments `a whole load of words
'. Actually, set can also be
used to set shell options, either as flags, e.g. `set -x
', or as
words after `-o
' , e.g. `set -o xtrace
' does the same as the
previous example. It's generally easier to use setopt
, and the upshot
is that you need to be careful when setting arguments this way in case
they begin with a `-
'. Putting `-``-
' before the real arguments
fixes this.
One other use of set
is to set any array, via
set -A any_array words to assign to any_array
which is equivalent to (and the standard Korn shell version of)
any_array=(words to assign to any_array)
One case where the set
version is more useful is if the name of an
array itself comes from a parameter:
arrname=myarray
set -A $arrname words to assign
has no easy equivalent in the other form; the left hand side of an ordinary assignment won't expand a parameter:
# Doesn't work; syntax error
$arrname=(words to assign)
This worked in old versions of zsh, but that was on the non-standard
side. The eval
command, described below, gives another way around
this.
Next comes `shift
', which simply moves an array up one element,
deleting the original first one. Without an array name, it operates on
the positional parameters. You can also give it a number to shift other
than one, before the array name.
shift array
is equivalent to
array=(${array[2,-1]})
(almost --- I'll leave the subtleties here for the chapter on expansion)
which picks the second to last elements of the array and assigns them
back to the original array. Note, yet again, that shift
operates using
the name, not the value of the array, so no `$
' should appear in
front, otherwise you get something similar to the trick I showed for
`set -A
'.
Finally, unset
unsets a parameter, and I already showed you could
unset a key/value pair of an associative array. There is one subtlety to
be mentioned here. Normally, unset
just makes the parameter named
disappear off the face of the earth. However, if you call unset
in a
function, its ghost lives on in the sense that any parameter you create
in the same name will be scoped as the original parameter was. Hence:
var='global value'
fn() {
typeset var='local value'
unset var
var='what about this?'
}
fn
print $var
The final statement prints `global value
': even though the local copy
of $var
was unset, the shell remembers that it was local, so the
second $var
in the function is also local and its value disappears at
the end of the function.
3.2.7: History control commands
The easiest way to access the shell's command history is by editing it
directly. The second easiest way is to use the `!
'-history mechanism.
Other ways of manipulating it are based around the fc
builtin, which
probably once stood for something (according to Oliver Kiddle, `fix
command', which is as good as anything). I talked quite a bit about it
in the last chapter, and don't really have anything to add. Just note
that the two other commands based around it are history
and r
.
3.2.8: Job control and process control
One of the major contributions of the C-shell was job control. You need to know about foreground and background tasks, and again I introduced these in the last chapter along with the options that control them. Here is an introduction to the relevant builtins.
You start a background job in two ways. First, directly, by putting an
`&
' after it:
sleep 10 &
and secondly by starting it in the normal way (i.e. in the foreground),
then typing ^Z
, and using the bg
command to put it in the
background. Between typing ^Z
and bg
, the job is still there, but is
not running; it is `suspended' or `stopped' (systems use different
descriptions for the same thing), waiting for you to decide what to do
with it. In either case, the job then continues without the shell
waiting for it. It will still try and read from or write to the terminal
if that's how you started it; you need to use the shell's redirection
facilities right at the start if you want to change that, there's
nothing you can do after the job has already started.
By the way, `sleep' isn't a builtin. Oddly enough, you can suspend a
builtin command or sequence of commands (such as shell function) with
^Z
, although since the shell has to continue executing your commands
as well as being suspended, it does the only thing it can do --- fork,
so that the commands you suspend are put into the background. Probably
you will only rarely do this with builtins. No other shell, so far as I
know, has this feature.
A job will stop if it needs to read from the terminal. You see a message like:
[1] + 1348 suspended (tty input) jobname and arguments
which means the job is suspended very much like you had just typed ^Z
.
You need to bring the job into the forground, as described below, so
that you can type something to it.
By the way, the key to type to suspend a command may not be ^Z
; it
usually is, but that can be changed. Run `stty -a
' and look for what
is listed after `susp =
' --- probably, but not necessarily, ^Z
. So
if you want to use another character --- it must be a single character;
this is handled deep in the terminal interface, not in the shell --- you
can run
stty susp '^]'
or whatever. You will note from the stty
output that various other job
control characters can be changed similarly. The stty
command is
external and its format for both output and input can vary quite a bit
from system to system.
Instead of putting the command into the background, you can bring it
back to the foreground again with fg
. This is useful for temporarily
stopping what you are doing so you can do something else. These days you
would probably do it in another window; in the old days when people
logged in from simple terminals this was even more useful. A typical
example of this is
more file # look at file
^Z # suspend
[1] + 8592 suspended more file # message printed
... # do something else
fg %1 # resume the `more'
The `%
' is the usual way of referring to jobs. The number after it is
what appeared in square brackets with the suspended message; I don't
know why the shell doesn't use the `%
' notation there, too. You also
see that with the `continued' message when you put something into the
background, and again at the end with the `done' message which tells
you a background job is finished. The `%
' can take other forms; the
most common is to follow it by the name of a command, such as `%more
'
in this case. The forms %+
and %-
refer to the most recent and
second most recent jobs --- the `+
' in the `suspended' message is
telling you that the more
job could be referred to like that.
Most of the job control commands will actually assume you are talking
about `%+
' if you don't give an argument, so assuming I hadn't
started any other commands in the background, I could just have put
`fg
' at the end of the sequence of commands above. This actually cuts
both ways: fg
is the default operation on jobs referred to with the
`%
' notation, so just typing `%1
' with no command name would have
worked, too.
You can jog your memory about what's going on with the `jobs
'
command. It looks like a series of messages of the form beginning with
the number in square brackets; usually the jobs will either be
`running' or `suspended'. This will tell you the numbers you need.
One other useful thing you can do with a job is to tell the shell to
forget about it. This is only really useful if it is already running in
the background; then you can run `disown
' with the job identifier.
It's useful for jobs you want to continue after you've logged out, as
well as jobs that have their own windows which you can therefore control
directly. With disowned jobs, the shell doesn't warn you that they are
still there when you log out. You can actually disown a background job
when you start it by putting `&|
' or `&!
' at the end of the line
instead of simply `&
'. Note that if the job was suspended when you
disowned it, it will stay disowned; this is pretty pointless, so you
probably should run `bg
' on it first.
The next most likely thing you want to do with a job is kill it, or
maybe suspend it when it's already in the background and you can't just
type ^Z
. This is where the kill
builtin comes in. There's more to
this than there is to the builtins mentioned above. First, you can use
kill
with other processes that weren't started from the current shell.
In that case, you would use a number to identify it, with no %
---
that's why the %
's were there in the other cases. Of course, you need
to find out the number; the usual way is with the ps
command, which is
not a builtin but which appears on all UNIX-like systems. As a stupid
example, here I start a disowned process which does very little, look
for it, then kill it:
% sleep 60 &|
% ps -f
UID PID PPID C STIME TTY TIME CMD
pws 623 614 0 22:12 pts/0 00:00:00 zsh
pws 8613 623 0 23:12 pts/0 00:00:00 sleep 60
pws 8615 623 0 23:12 pts/0 00:00:00 ps -f
% kill 8613
% ps -f
UID PID PPID C STIME TTY TIME CMD
pws 623 614 0 22:12 pts/0 00:00:00 zsh
pws 8616 623 0 23:12 pts/0 00:00:00 ps -f
The process has disappeared the second time I look. Notice that in the usual lugubrious UNIX way the shell didn't bother to tell you the process had been killed; however, it will report an error if it failed to send it the signal. Sending it the signal is all the shell cares about; the shell won't warn if you if the process decided it didn't want to die when told to, so it's still a good idea to check.
Sometimes you want to wait for a process to exit; the wait
builtin can
do this, and like kill
can take a process number as well as a job
number. However, that's a bit deceptive --- you can't actually wait for
a process which wasn't started directly from the shell. Indeed, the
mechanism for waiting is all bound up with the way UNIX handles
processes; unless its parent waits for it, a process becomes a `zombie'
and hangs around until the system's foster parent, the `init' process
(always process number 1) waits for it instead. It's all a little bit
baroque, but for the shell user, wait just means you can hang on until
something you started has finished. Indeed, that's how foreground
processes work: the shell in effect uses the internal version of wait
to hang around until the job exits. (Well, actually that's a lie; the
system wakes it up from whatever it's doing to tell it a child has
finished, so all it has to do is doze off to wait.)
Furthermore, you can wait for a process even if job control isn't
running. Job control, basically anything involving those %
's, is only
useful when you are sitting at a terminal fiddling with commands; it
doesn't operate when you run scripts, say. Then the shell has much less
freedom in how to control its jobs, but it can still wait for a
background process, and it can still use kill
on a process if it knows
its number. For this purpose, the shell stores the ID of the last
process started in the background in the parameter $!
; there's
probably a good reason for the `!
', but I don't know what it is. This
happens regardless of job control.
Signals
The kill
command can do a good deal more than just kill a process.
That is the default action, which is why the command has that name. But
what it's really doing is sending a `signal' to a process. Signals are
the simplest way of communicating to another process; in fact, they are
about the only simple way if you haven't made special arrangements for
the process to read messages from you. Signal names are written like
SIGINT
, SIGTSTP
, SIGKILL
; to send a particular signal to a
process, you remove the SIG
, stick a hyphen in front, and use that as
the first argument to kill
, e.g.:
kill -KILL 8613
Some of the things you already know about are actually doing just that.
When you type ^C
to stop a process, you are actually sending it a
SIGINT
for `interrupt', as if you had done
kill -INT 8613
The usual signal sent by kill
is not, as you might have guessed,
SIGKILL
, but actually SIGTERM
for `terminate'; SIGKILL
is
stronger as the process can't block that signal, as it can with many
(we'll see how the shell can do that in a moment). It's familiar to UNIX
hackers as `kill -9
', because all the signals also have numbers. You
can see the list of signals in zsh by doing:
% print $signals
EXIT HUP INT QUIT ILL TRAP ABRT BUS FPE KILL USR1
SEGV USR2 PIPE ALRM TERM STKFLT CLD CONT STOP TSTP
TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH POLL PWR
UNUSED ZERR DEBUG
Your list will probably be different from mine; this is for Linux, and
the list is very system-specific, even though the first nine are
generally the same, and many of the others are virtually always present.
Actually, SIGEXIT
is an invention by the shell for you to allow the
shell to do something when a function exits (see the section on `traps'
below); you can't actually use `kill -EXIT
'. Thus SIGHUP
is the
first real signal, and indeed that's number one, so you have to shift
the contents of $signals
along one to get the right numbers. SIGTERM
and SIGINT
usually have the same effect, stopping the process, unless
that has decided to handle the signal some other way.
The last two signals are bogus, too: SIGZERR
is to allow the shell to
do something on an error (non-zero exit status), while with SIGDEBUG
you can do it on every command. Again, the `something' to be executed
is a `trap', as I'll discuss in a short while.
Typing ^Z
to suspend a process actually sends the process a SIGTSTP
(terminal stop, since it usually comes from the terminal), while
SIGSTOP
is similar but usually doesn't come from a terminal. Even
restarting a process as with bg
sends it a signal, in this case
SIGCONT
. It seems a bit odd to signal a process to restart; why can't
the operating system just restart it when you ask? The real answer is
probably that signals provide an easy way for you to talk to the
operating system without grovelling around in the dirt too much.
Before I talk about how you make the shell handle signals it receives,
there is one extra oddment: the suspend
builtin effectively sends the
shell a signal to suspend it, as if you'd typed ^Z
, though as you've
probably found by now that doesn't suspend the shell itself. It's only
useful to do this if the shell is running under some other programme,
else there's no way of restoring it and suspending is effectively the
same as exiting the shell. For this reason, the shell won't let you call
suspend
in a login shell, because it assumes that is running as the
top level (though in the previous chapter you learnt there's actually
nothing that special about login shells; you can start one just with
`zsh -l'). If you're logged in remotely via rsh
or ssh
, it's
usually more convenient to use the keystrokes `~^Z
' which those
define, rather than zsh's mechanism; they have to be at the beginning of
a line, so hit return first if necessary. This returns you to your local
terminal; you can resume the remote login with `fg
' just like any
other programme.
Traps
The way of making the shell handle signals is called `traps'. There are actually two mechanisms for this. I'll present the more standard one and then talk about the advantages and drawbacks of the other one at the end.
The standard version (shared with other shells) is via the `trap
'
builtin. The first argument is a chunk of shell code to execute, which
obviously needs to be quoted when you pass it as an argument, and the
remaining arguments are a list of signals to handle, minus the SIG
prefix. So:
trap "echo I\\'m trapped." INT
tells the shell what to do on SIGINT
, i.e. ^C
. Note the extra layer
of quoting: the double quotes surround the code, so that when they are
stripped trap
sees the chunk
echo I\'m trapped
Usually the shell would abort what it was doing and return to the main
prompt when you hit ^C
. Now, however, it will simply print the message
and carry on. You can try this, for example, with
read line
If you hit ^C
while it's waiting for input, you'll see the message go
up, but the shell will still wait for you to type a line.
A warning about this: ^C
is only trapped within the shell itself. If
you start up an external programme, it will have its own mechanism for
handling signals, and if it usually aborts on ^C
it still will. But
there's a sting in the tail: do
cat
which waits for input to output again (you need to use ^D
to exit
normally). If you type ^C
here, the command will be aborted, as I said
--- but you still get the message `I'm trapped
'. That's because the
shell is able to tell that the command got that particular signal, and
calls the trap when the cat
exits. Not all shells do this;
furthermore, some commands which handle signals themselves won't give
the shell enough information to know that a signal arrived, and in that
case the trap won't be called. Such commands are usually the more
sophisticated things like editors or screen managers or whatever; you
just have to find out by trial and error.
You can also make the shell ignore the signal completely. To do this, the first argument should be an empty string:
trap '' INT
Now ^C
will have no effect, and this time the effect is passed on
directly to commands called from the shell --- try the cat
example and
you won't be able to interrupt it; type ^D
or use the lesser known but
more powerful ^\
(control with backslash), which sends SIGQUIT
. If
it hasn't been disabled, this will also produce a file core
, which
contains debugging information about what the programme was doing when
it exited --- never call your own files core
. You can trap SIGQUIT
too, if you want. (The shell itself usually ignores SIGQUIT
; it's only
useful for external commands.)
Now the other sort of trap. I could have written for the first example:
TRAPINT() {
print I\'m trapped.
}
As you can see, this is just a function: functions beginning TRAP
are
special. However, it's a real function too; you can call it by hand with
the command `TRAPINT', and it will run perfectly happily with no funny
side effects.
There is a difference between the way the two types work. In the
`trap
' sort of trap, the code is just evaluated just as if it
appeared as instructions to the shell at the point where the trap
happened. So if you were in a function, you would see the environment of
that function with its local variables; if you set a local variable with
typeset
, it would be visible in the function just as if it were
created there.
However, in the function type of trap, the code is provided with its own
function environment. Now if you use typeset
the parameter created is
local only to the trap. In most cases, that's all the difference there
is; it's up to you to decide which is more convenient. As you can see,
the function type of trap doesn't require the extra layer of quoting, so
looks a little smarter. Conveniently, the `trap
' command on its own
lists all traps in the form of the shell code you'd need to recreate
them, and you can see which sort is which.
There are two cases where the difference sticks out. One is that the function type has some extra wiring to allow you both to trap a signal, and pretend to anyone watching that the shell didn't handle it. An example will show this:
TRAPINT() {
print "Signal caught, stopping anyway."
return $(( 128 + $1 ))
}
That second line may look as rococo as the Amalienburg, but it's meaning
is this: $1
, the first argument to the function, is set to the number
of the signal. In this case it will be 2 because that's the standard
number for SIGINT
. That means the arithmetic substitution $((...))
returns 130, the command `return 130
' is executed, and the function
returns with status 130. Returning with non-zero status is special in
function traps: it tells the shell you want to abort the surrounding
command even though the trap was handled, and that you want the status
associated with that to be 130. It so happens that this is how UNIX
handles returns from normal traps. Without setting a trap, do
% cat
^C
% print $?
and you'll see that this, too, has given the status 130, 128 plus the
value of SIGINT
. So if you do have the trap set, you'll see the
message, but the command will abort --- even if it was running inside
the shell.
Try
% read line
^C
to see that happening. If you look at the status in $?
you'll find
it's actually 1, not 130; that's because the read
command, when it
aborted, overrode the return value from the trap. But it does that with
an untrapped ^C
, too, so that's not really an exception to what I've
just said.
If you've been paying attention, you'll realise that traps set with the
trap
builtin can't do it in quite this way, because the function they
return from would be whatever function you were in. You can see that:
trap 'echo Returning...; return;' INT
fn() {
print In fn...
read param
print Leaving fn..
}
If you run fn
and hit ^C
, the signal is trapped and the message
printed, but because of the return
, the shell quits fn
immediately
and you don't see the final message. If you missed out the `return;
'
(try it), the shell would carry on with the rest of fn
after you typed
something to read
. Of course you can use this mechanism to leave
functions after trapping a signal; it just so happens that in this case
the mechanism with TRAPINT
is a little closer to what untrapped
signals do and hence a little neater.
One final flourish of late Baroque splendour: the trap for SIGEXIT
,
the one called when a function (or the shell itself, in fact) exits is a
bit special because in the case of exiting a function it will be called
in the environment of the calling function. So if you need to do
something like set a local variable for an enclosing function you can
have
trap 'typeset param_in_enclosing_func=value' EXIT
do it for you; you couldn't do that with TRAPEXIT
because the code
would have its own function, so that even though it would be called
after the first function exited, it wouldn't run directly in the
enclosing one but in a separate TRAPEXIT
function. You can even set an
EXIT trap for the enclosing function by defining a nested `trap .. EXIT
' inside that trap itself.
I lied, because there is one more special thing about TRAPEXIT
: it's
always reset after you exit a function and the trap itself has been
called. Most traps just hang around until you explicitly unset them.
There is an option, LOCAL_TRAPS
, which makes traps set inside
functions as well insulated as possible from those outside, or inside
deeper functions. In other words, the old trap is saved and then
restored when you exit the function; the scoping works pretty much like
that for typeset
, and in the same way traps for the enclosing scope,
apart from any for EXIT
, remain in effect inside a function unless you
explicitly override them; and, again in the same way, if you unset it
inside the function it will still be restored on exit.
LOCAL_TRAPS
is the fixed behaviour of some other shells. In zsh,
without the option set:
trap 'echo Hi.' INT
fn() {
trap 'echo Bye.' INT
}
Calling fn
simply replaces the trap defined outside the function with
the one defined inside while:
trap 'echo Hi.' INT
fn() {
setopt localtraps
trap 'echo Bye.' INT
}
puts the original `Hi' trap back after the function exits.
I haven't told you how to unset a trap for good: the answer is
trap - INT
As you would guess, you can use unfunction
with function-type traps;
that will correctly remove the trap as well as deleting the function.
However, `trap -
' works with both, so that's the recommended way.
Limits on processes
One other way that jobs started by the shell can be controlled is by
using limits. These are actually limits set by the operating system, but
the shell gives you a way of controlling them: the limit
and unlimit
commands. Type `limit
' on its own to see a summary. I get:
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8MB
coredumpsize 0kB
memoryuse unlimited
maxproc 2048
descriptors 1024
memorylocked unlimited
addressspace unlimited
where the item on the left of each line is what is being limited, and on
the right is the value. The manual page to look at, at least on Linux is
for the function getrusage
; that's the function the shell is calling
when you run limit
or unlimit
.
In this case, the items are:
cputime
the total CPU time used by a processfilesize
maximum size of a filedatasize
the maximum size of data in use by a programmestacksize
the maximum size of the stack, which is the area of memory used to store information during function callscoredumpsize
the maximum size of acore
file, which is an image of memory left by a programme that crashes, allowing you to debug it withgdb
,dbx
,ddd
or some other debuggermemoryuse
the maximum main memory, i.e. programme memory which is in active use and hasn't been `swapped out' to diskmaxproc
the maximum number of simultaneous processesdescriptors
the maximum number of simultaneously open files (`descriptors' are the internal mechanism for referring to an open file on UNIX-like systems)memorylocked
the maximum amount of memory locked in (I don't know what that is, either)addressspace
the total amount of virtual memory, i.e. any memory whether it is main memory, or refers to somewhere on a disk, or indeed anything else.
You may well see other names; the shell decides when it is compiled what limits are supported by the system.
Of those, the one I use most commonly is coredumpsize
: sometimes when
I'm debugging a file I want a crashed programme to produce a `core
'
files so I can run gdb
or dbx
on it (`unlimit coredumpsize
'),
while other times they are just untidy (`limit coredumpsize 0
').
Probably you would only alter any of the others if you knew there was a
problem, for example a number-crunching programme used so much memory
that the rest of the system was badly affected and you wanted to limit
datasize
to 64 megabyte or whatever. You could write this as:
limit datasize 64m
There is a distinction made between `hard' and `soft' limits. Both
have the same effect on programmes, but you can remove or reduce `soft'
limits, while only the superuser (the system administrator's login,
root) can do that to `hard' limits. Usually, therefore, limit
and
unlimit
manipulate soft limits; to show or set hard limits, give the
option -h
. If I do `limit -h
', I get the same list of limits as
above, but with stacksize
and coredumpsize
unlimited --- that means
I can reduce or remove the limits on those if I want, they're just set
for my own convenience.
Why is stacksize
set in this way? As I said, it refers to the memory
in which the functions in programmes store variables and any other local
information. If one function calls another, it uses more memory. You can
get into a situation where functions call themselves recursively and
there is no way out until the machine runs out of memory; limiting
stacksize
prevents this. You can actually see this with zsh itself
(probably better not to try this if you'd rather the shell you're
running didn't crash):
% fn() { fn; }
% fn
defines a function which keeps calling itself. To do this, all the functions inside zsh are calling themselves as well, using more and more stack memory. Actually, zsh uses other forms of memory inside each function and my version of zsh crashes due to exhaustion of that memory instead. However, it depends on the system how this works out.
Times
One way of returning information on process resources is with the
`times
' command. It simply shows the total CPU time used by the shell
and by the programmes called for it --- in that order, and without
description, so you need to remember. On each line, the first number is
the time spent in user space and the second is the time spent in system
space. If you're not concerned about the details of programmes the
difference is pretty irrelevant, but if you are, then the difference is
very roughly that between the time spent in the code you actually see
before you compile a programme, and the time spent in `hidden' code
where the system is doing something for you. It's not such an obvious
distinction, because many library routines, such as mathematical
functions, are run in user mode as no privileged access to internal bits
of the system is required. Typically, system time is concerned with the
details of input and output --- though even there it's not so simple,
because the C output routines printf
, puts
, fread
and others have
user mode code which then calls the system routines read
, write
and
so on.
You can measure the time taken by a particular external command by
putting `time
', in the singular this time, in front of it; this is
essentially another precommand modifier, and is a shell reserved word
rather than a builtin. This gives fairly obvious information. You can
specify the information using the $TIMEFMT
parameter, which has its
own percent escapes, different from the ones used in prompts. It exists
partly because the shell allowed you to access all sorts of other
information about the process which ran, such as `page faults' ---
occasions when the system had to fetch a part of the programme or data
from disk because it wasn't in the main memory. However, that
disappeared because it was too much work to convert the feature to
configure itself automatically for different operating systems. It may
be time to resurrect it.
You can also force the time to be shown automatically by setting the
parameter $REPORTTIME
; if a command runs for more than this many
seconds, the $TIMEFMT
output will be shown automatically.
3.2.9: Terminals, users, etc.
Watching for other users
Although this is more associated with parameters than builtins, the
`log
' command will tell you whether any of a group of people you want
to watch out for have logged in or out. To use this, you set the
$watch
array parameter to a list of user names, or `all
' for
everyone, or `notme
' for everyone except yourself. Even if you don't
use log
, any changes will be reported just before the shell prints a
prompt. It will be printed using the $WATCHFMT
parameter: once again,
this takes its own set of percent escapes, listed in the zshparam
manual.
ttyctl
There is a command ttyctl
which is designed to keep badly behaved
external commands from messing up the terminal settings. Most programmes
are careful to restore any settings they change, but there are
exceptions. After `ttyctl -f
', the terminal is frozen; zsh will
restore the settings, no matter what an external programme does with it.
This includes deliberate attempts to change the terminal settings with
the `stty
' command, so the default is unfrozen, `ttyctl -u
'.
3.2.10: Syntactic oddments
This section collects together a few builtins which, rather than controlling the behaviour of some feature of the shell, have some other special effect.
Controlling programme flow
The four functions here are exit
, return
, break
, continue
and
source
or .
: they determine what the shell does next. You've met
exit
--- leave the shell altogether --- and return
--- leave the
current function. Be very careful not to confuse them. Calling exit
in
a shell function is usually bad:
% fn() { exit; }
% fn
This makes you entire shell session go away, not just the function. If
you write C programmes, you should be very familiar with both, although
there is one difference in this case: return
at the top level in an
interactive shell actually does nothing, rather than leaving the shell
as you might expect. However, in a script, return outside a function
does cause the entire script to stop. The reason for this is that zsh
allows you to write autoloaded functions in the same form as scripts, so
that they can be used as either; this wouldn't work if return
did
nothing when the file was run as a script. Other shells don't do this:
return
does nothing at the top level of a script, as well as
interactively. However, other shells don't have the feature that
function definition files can be run as scripts, either.
The next two commands, break
and continue
, are to do with constructs
like `if
'-blocks and loops, and it will be much easier if I introduce
them when I talk about those below. They will also already be familiar
to C programmers. (If you are a FORTRAN programmer, however, continue
is not the statement you are familiar with; it is instead equivalent
to CYCLE
in FORTRAN90.)
The final pair of commands are .
and source
. They are similar to one
another and cause another file to be read as a stream of commands in the
current shell --- not as a script, for which a new shell would be
started which would finish at the end of the script. The two are
intended for running a series of commands which have some effect on the
current shell, exactly like the startup files. Indeed, it's a very
common use to have a call to one or other in a startup file; I have in
my ~/.zshrc
[[ -f ~/.aliasrc ]] && . ~/.aliasrc
which tests if the file ~/.aliasrc
exists, and if so runs the commands
in it; they are treated exactly as if they had appeared directly at that
point in .zshrc
.
Note that your $path
is used to find the file to read from; this is a
little surprising if you think of this as like a script being run, since
zsh doesn't search for a script, it uses the name exactly as you gave
it. In particular, if you don't have `.
' in your $path
and you use
the form `.
' rather than `source
' you will need to say explicitly
when you want to source a file in the current directory:
. ./file
otherwise it won't be found.
It's a little bit like running a function, with the file as the function
body. Indeed, the shell will set the positional parameters $*
in just
the same way. However, there's a crucial difference: there is no local
parameter scope. Any variables in a sourced file, as in one of the
startup files, are in the same scope as the point from which it was
started. You can, therefore, source a file from inside a function and
have the parameters in the sourced file local, but normally the only way
of having parameters only for use in a sourced file is to unset them
when you are finished.
The fact that both .
and source
exist is historical: the former
comes from the Bourne shell, and the latter from the C shell, which
seems deliberately to have done everything differently. The point noted
above, that source always searches the current directory (and searches
it first), is the only difference.
Re-evaluating an expression
Sometimes it's very useful to take a string and run it as if it were a
set of shell commands. This is what eval
does. More precisely, it
sticks the arguments together with spaces and calls them. In the case of
something like
eval print Hello.
this isn't very useful; that's no different from a simple
print Hello.
The difference comes when what's on the command line has something to be expanded, like a parameter:
param='print Hello.'
eval $param
Here, the $param
is expanded just as it would be for a normal command.
Then eval
gets the string `print Hello.
' and executes it as shell
command line. Everything --- really everything --- that the shell would
normally do to execute a command line is done again; in effect, it's run
as a little function, except that no local context for parameters is
created. If this sounds familiar, that's because it's exactly the way
traps defined in the form
trap 'print Hello.' EXIT
are called. This is one simple way out of the hole you can sometimes get yourself into when you have a parameter which contains the name of another parameter, instead of some data, and you want to get your hands on the data:
# somewhere above...
origdata='I am data.'
# but all you know about is
paramname=origdata
# so to extract the data you can do...
eval data=\$$paramname
Now $data
contains the value you want. Make sure you understand the
series of expansions going on: this sort of thing can get very
confusing. First the command line is expanded just as normal. This turns
the argument to eval
into `data=$origdata
'. The `$
' that's still
there was quoted by a backslash; the backslash was stripped and the
`$
' left; the $paramname
was evaluated completely separately ---
quoted characters like the \$
don't have any effect on expansions ---
to give origdata
. Eval calls the new line `data=$origdata
' as a
command in its own right, with the now obvious effect. If you're even
slightly confused, the best thing to do is simply to quote everything
you don't want to be immediately expanded:
eval 'data=$'$paramname
or even
eval 'data=${'$paramname'}'
may perhaps make your intentions more obvious.
It's possible when you're starting out to confuse `eval
' with the
`...`
and $(...)
commands, which also take the command in the
middle `...
' and evaluate it as a command line. However, these two
(they're identical except for the syntax) then insert the output of that
command back into the command line, while eval
does no such thing; it
has no effect at all on where input and output go. Conversely, the two
forms of command substitution don't do an extra level of expansion.
Compare:
% foo='print bar'
% eval $foo
bar
with
% foo='print bar'
% echo $($foo)
zsh: command not found: print bar
The $
(...) substitution took $foo
as the command line. As you are
now painfully aware, zsh doesn't split scalar parameters, so this was
turned into the single word `print bar
', which isn't a command. The
blank line is `echo
' printing the empty result of the failed
substitution.
3.2.11: More precommand modifiers: exec
, noglob
Sometimes you want to run a command instead of the shell. This sometimes happens when you write a shell script to process the arguments to an external command, or set parameters for it, then call that command. For example:
export MOZILLA_HOME=/usr/local/netscape
netscape "$@"
Run as a script, this sets an environment variable, then starts
netscape
. However, as always the shell waits for the command to
finish. That's rather wasteful here, since there's nothing more for the
shell to do; you'd rather it simply magically turned into the netscape
command. You can actually do this:
export MOZILLA_HOME=/usr/local/netscape
exec netscape "$@"
`exec
' tells the shell that it doesn't need to wait; it can just make
the command to run replace the shell. So this only uses a single
process.
Normally, you should be careful not to use exec
interactively, since
normally you don't want the shell to go away. One legitimate use is to
replace the current zsh with a brand new one if (say) you've set a whole
load of options you don't like and want to restore the ones you usually
have on startup:
exec zsh
Or you may have the bad taste to start a completely different shell
altogether. Conversely, a good piece of news about exec
is that it is
common to all shells, so you can use it from another shell to start zsh
in the way I've just shown.
Like `command
' and `builtin
', `exec
' is a `precommand
modifier' in that it alters the way a command line is interpreted.
Here's one more:
noglob print *
If you've remembered what `glob' means, this is all fairly obvious. It
instructs the shell not to turn the `*
' into a list of all the files
in the directory, but instead to let well alone. You can do this by
quoting the `*
', of course; often noglob
is used as part of an
alias to set up commands where you never need filename generation and
don't want to have to bother quoting everything. However, note that
noglob
has no effect on any other type of expansion: parameter
expansion and backquote (`....`
) expansion, for example, happen as
normal; the only thing that doesn't is turning patterns into a list of
matching files. So it doesn't take away the necessity of knowing the
rules of shell expansion. If you need that, the best thing to do is to
use read
or vared
(see below) to read a line into a parameter, which
you pass to your function:
read -r param
print $param
The -r
makes sure $param
is the unadulterated input.
3.2.12: Testing things
I told you in the last chapter that the right way to write tests in zsh
was using the `[[ ... ]]
' form, and why. So you can ignore the two
builtins `test
' and `[
', even though they're the ones that
resemble the Bourne shell. You can safely write
if [[ $foo = '' ]]; then
print The parameter foo is empty. O, misery me.
fi
or
if [[ -z $foo ]]; then
print Alack and alas, foo still has nothing in it.
fi
instead of monstrosities like
if test x$foo != x; then
echo The emptiness of foo. Yet are we not all empty\?
fi
because even if $foo
does expand to an empty string, which is what is
implied if the tests are true, `[[ ... ]]
' remembers there was
something there and gets the syntax right. Rather than a builtin, this
is actually a reserved word --- in fact it has to be, to be
syntactically special --- but you probably aren't too bothered about the
difference.
There are two sorts of tests, both shown above: those with three
arguments, and those with two. The three-argument forms all have some
comparison in the middle; in addition to `=
' (or `==
', which means
the same here, and which according to the manual page we should be
using, though none of us does), there are `!=
' (not equal), `<
',
`>
', `<=
' and `>=
'. All these do string comparisons, i.e.
they compare the sort order of the strings.
Since there are better ways of sorting things in zsh, the `=
' and
`!=
' forms are by far the most common. Actually, they do something a
bit more than string comparison: the expression on the right can be a
pattern. The patterns understood are just the same as for matching
filenames, except that `/
' isn't special, so it can be matched by a
`*
'. Note that, because `=
' and `!=
' are treated specially by
the shell, you shouldn't quote the patterns: you might think that unless
you do, they'll be turned into file names, but they won't. So
if [[ biryani = b* ]]; then
print Word begins with a b.
fi
works. If you'd written 'b*'
, including the quotes, it wouldn't have
been treated as a pattern; it would have tested for a string which was
exactly the two letters `b*
' and nothing else. Pattern matching like
this can be very powerful. If you've done any Bourne shell programming,
you may remember the only way to use patterns there was via the
`case
' construction: that's still in zsh (see below), and uses the
same sort of patterns, but the test form shown above is often more
useful.
Then there are other three-argument tests which do numeric comparison.
Rather oddly, these use letters rather than mathematical symbols:
`-eq
', `-lt
' and `-le
' compare if two numbers are equal, less
than, or less than or equal, to one another. You can guess what `-gt
'
and `-ge
' do. Note this is the other way round to Perl, which much
more logically uses `==
' to test for equality of numbers (not `=
',
since that's always an assignment operator in Perl) and `eq
' (minus
the minus) to test for equality of strings. Unfortunately we're now
stuck with it this way round. If you are only comparing numbers, it's
better to use the `(( ... ))
' expression, because that has a proper
understanding of arithmetic. However,
if [[ $number -gt 3 ]]; then
print Wow, that\'s big
fi
and
if (( $number > 3 )); then
print Wow, that\'s STILL big
fi
are essentially equivalent. In the second case, the status is zero (true) if the number in the expression was non-zero (sorry if I'm confusing you again) and vice versa. This means that
if (( 3 )); then
print It seems that 3 is non-zero, Watson.
fi
is a perfectly valid test. As in C, the test operators in arithmetic
return 1 for true and 0 for false, i.e. `$number > 3
' is 1 if
$number
is greater than 3 and 0 otherwise; the inversion to shell
logic, zero for true, only occurs at the final step when the expression
has been completely evaluated and the `(( ... ))
' command returns. At
least with `[[ ... ]]
' you don't need to worry about the extra
negation; you can simply think in logical terms (although that's hard
enough for a lot of people).
Finally, there are a few other odd comparisons in the three-argument form:
if [[ file1 -nt file2 ]]; then
print file1 is newer than file2
fi
does the test implied by the example; there is also `-ot
' to test for
an older file, and there is also the little-used `-ef
' which tests
for an `equivalent file', meaning that they refer to the same file ---
in other words, are linked; this can be a hard or a symbolic link, and
in the second case it doesn't matter which of the two is the symbolic
link. (If you were paying attention above, you'll know it can't possibly
matter in the first case.)
In addition to these tests, which are pretty recognisable from most
programming languages --- although you'll just have to remember that the
`=
' family compares strings and not numbers --- there are another set
which are largely peculiar to UNIXy scripting languages. These are all
in the form of a hyphen followed by a letter as the test, which always
takes a single argument. I showed one: `-z $var' tests whether
`$var
' has zero length. It's opposite is `-n $var' which tests for
non-zero length. Perhaps this is as good a time as any to point out that
the arguments to these commands can be any single word expression, not
just variables or filenames. You are quite at liberty to test
if [[ -z "$var is sqrt(`print bibble`)" ]]; then
print Flying pig detected.
fi
if you like. In fact, the tests are so eager to make sure that they only have a one word argument that they will treat things like arrays, which usually return a whole set of words, as if they were in double quotes, joining the bits with spaces:
array=(two words)
if [[ $array = 'two words' ]]; then
print "The array \$array is OK. O, joy."
fi
Apart from `-z
' and `-n
', most of the two-argument tests are to do
with files: `-e
' tests that the file named next exists, whatever type
of file it is (it might be a directory or something weirder); `-f
'
tests if it exists and is a regular file (so it isn't a directory or
anything weird this time); `-x
' tests whether you can execute it.
There are all sorts of others which are listed in the manual page for
various properties of files. Then there are a couple of others: ``-o