Table of Contents generated with DocToc

Chapter 1: A short introduction

Chapter 1: A short introduction

The Z-Shell, `zsh' for short, is a command interpreter for UNIX systems, or in UNIX jargon, a `shell', because it wraps around the commands you use. More than that, however, zsh is a particularly powerful shell --- and it's free, and under regular maintenance --- with lots of interactive features allowing you to do the maximum work with the minimum fuss. Of course, for that you need to know what the shell can do and how, and that's what this guide is for.

The most basic basics: I shall assume you have access to a UNIX system, otherwise the rest of this is not going to be much use. You can also use zsh under Windows by installing Cygwin, which provides a UNIX-like environment for programmes --- given the weakness of the standard Windows command interpreter, this is a good thing to do. There are ports of older versions of zsh to Windows which run natively, i.e. without a UNIX environment, although these have a slightly different behaviour in some respects and I won't talk about them further.

I'll also assume some basic knowledge of UNIX; you should know how the filesystem works, i.e. what /home/users/pws/.zshrc and ../file mean, and some basic commands, for example ls, and you should have experience with using rm to delete completely the wrong file by accident, and that sort of thing. In something like `rm file', I will often refer to the `command' (rm, of course) and the `argument(s)' (anything else coming after the command which is used by it), and to the complete thing you typed in one go as the `command line'.

You're also going to need zsh itself; if you're reading this, you may well already have it, but if you don't, you or your system administrator should read Appendix A. For now, we'll suppose you're sitting in front of a terminal with zsh already running.

Now to the shell. After you log in, you probably see some prompt (a series of symbols on the screen indicating that you can input a command), such as `$' or `%', possibly with some other text in front --- later, we'll see how you can change that text in interesting ways. That prompt comes from the shell. Type `print hello', then backspace over `hello' and type `goodbye'. Now hit the `Return' key (or `Enter' key, I'll just say <RET> from now on, likewise <TAB> for the tab key, <SPC> for the space key); unless you have a serious practical-joker problem on your system, you will see `goodbye', and the shell will come back with another prompt. All of the time up to when you hit <RET>, you were interacting with the shell and its editor, called `Z-Shell Line Editor' or `zle' for short; only then did the shell go away and tell the print command to print out a message. So you can see that the shell is important.

However, if all you're doing is typing simple commands like that, why do you need anything complicated? In that case, you don't; but real life's not that simple. In the rest of this guide, I describe how, with zsh's help, you can:

customise the environment in which you work, by using startup files,
write your own commands to shorten tasks and store things in shell variables (`parameters') so you don't have to remember them,
use zle to minimise the amount of typing you have to do --- in zsh, you can even edit small files that way,
pick the files you want to use for a particular command such as mv or ls using zsh's very sophisticated filename generation (known colloquially as `globbing') system,
tell the editor what sort of arguments you use with particular commands, so that you only need to type part of the name and it will complete the rest, using zsh's unrivalled programmable completion system,
use the extra add-ons (`modules') supplied with the latest version of zsh to do other things you usually can't do in a shell at all.

That's only a tiny sample. Since there's so much to say, this guide will concentrate on the things zsh does best, and in particular the things it has which other shells don't. The next chapter gives a few of the basics, by trying to explain how to set the shell up the way you want it. Like the rest of the guide, it's not intended to be exhaustive, for which you should look at the shell manual.

Some other things you should probably know straight away. First, the shell is always running, even when the command you typed is running, too; the shell simply hangs around waiting for it to finish: you may know from other shells about putting commands in the background by putting an `&' after the command, which means that the shell doesn't wait for them to finish. The shell is there even if the command's in the foreground, but in this case doing nothing.

Second, it doesn't just run other people's commands, it has some of its own, called builtin commands or just builtins, and you can even add your own commands as lists of instructions to the shell called functions; builtins and functions always run in the shell itself. That's important to know, because things which don't run in the shell itself can't affect it, and hence can't alter parameters, functions, aliases, and all the other things I shall talk about.

1.1: Other shells and other guides

If you want a basic grounding in how shells work, what their syntax is (i.e. how to write commands), and how to write scripts and functions, you should read one of the many books on the subject. In particular, you will get most out of a book that describes the Korn shell (ksh), as zsh is very similar to this --- so similar that it will be worth my while pointing out differences as we go along, since they can confuse ksh users. Recent versions of zsh can emulate ksh (strictly, the 1988 version of ksh, although there are increasingly features from the 1993 version) quite closely, although it's not perfect, and less perfect the more closely you look. However, it's important to realise that if you just start up any old zsh there is no guarantee that it will be set up to work like ksh; unless you or your system adminstrator have changed some settings, it certainly won't be. You might not see that straight away, but it affects the shell in subtle ways. I will talk about emulation a bit more later on.

A few other shells are worth mentioning. The grandfather of all UNIX shells is sh, now known as the Bourne shell but originally just referred to as `the shell'. The story is similar to ksh: zsh can emulate sh quite closely (much more closely than ksh, since sh is considerably simpler), but in general you need to make sure it's set up to do that before you can be sure it will emulate sh.

You may also come across the `Bourne-Again Shell', bash. This is a freely-available enhancement of sh written by the GNU project --- but it is not always enhanced along the lines of ksh, and hence in many ways it is very different from zsh. On some free UNIX-like systems such as Linux/GNU (which is what people usually mean by Linux), the command sh is really bash, so there you should be extra careful when trying to ensure that something which runs under the so-called `sh' will also run under zsh. Some Linux systems also have another simpler Bourne shell clone, ash; as it's simpler, it's more like the original Bourne shell.

Some more modern operating systems talk about `the POSIX shell'. This is an attempt to standardize UNIX shells; it's most like the Korn shell, although, a bit confusingly, it's often just called sh, because the standard says that it should be. Usually, this just means you get a bit extra free with your sh and it still does what you expect. Zsh has made some attempts to fit the standard, but you have to tell it to --- again, simply starting up `zsh' will not have the right settings for that.

There is another common family of shells with, unfortunately, incompatible syntax. The source of this family is the C-Shell, csh, so called because its syntax looks more like the C programming language. This became widespread when the only other shell available was sh because csh had better interactive features, such as job control. It was then enhanced to make tcsh, which has many of the interactive features you will also find in zsh, and so became very popular. Despite these common features, the syntax of zsh is very different, so you should not try and use csh/tcsh commands beyond the very simplest in zsh; but if you are a tcsh user, you will find virtually every capability you are used to in zsh somewhere, plus a lot more.

1.2: Versions of zsh

At the time of writing, the most recent version of zsh available for widespread use was 4.0.6. You will commonly find two sets of older zsh's around. The 3.0 series, of which the last release was 3.0.9, was a stable release, with only bug fixes since the first release of zsh 3. The 3.1 series were beta versions, with lots of new features; the last of these, 3.1.9, was not so different from 4.0.1; the main change is that the shell has now been declared stable, so that as with zsh 3 there will be a set of bug fixes, labelled 4.0, and a set with new functions in, labelled 4.1. As 4.0 replaces all zsh 3 versions, I will try to keep things simple and talk about that; but every now and then it will be helpful to point out where older versions were different.

One notable feature of zsh is the completion of command line arguments. The system changed in 3.1.6 and 3.1.7 to make it a lot more configurable, and (provided you keep your wits about you) a little less obscure. I therefore won't describe the old completion system, which used the `compctl' command, in any detail; a very brief introduction is given in the zsh FAQ. The old system remains available, however we strongly recommend new users to start with the new one. See chapter 6 `Completion, old and new' for the lowdown on new-style completion.

There won't be a big difference between 4.0 and 4.1, just bug fixes and a few evolutionary changes, plus some extra modules. There will be some notes in chapter 7 about new features in 4.1, but nothing you write for 4.0 is likely to become obsolete in the foreseeable future.

1.3: Conventions

Most of what I say will be reasonably self-contained (which means I use phrases like `as I said before' and `as I'll discuss later on' more than a real stylist would like, and the number times I refer to other chapters is excessive), but there are some points I should perhaps draw your attention to before you leap in.

I will often write chunks of code as you would put them in a file for execution (a `script' or a `function', the differences to be discussed passim):

  if [[ $ZSH_VERSION = 3.* ]]; then
    print This is a release of the third version of zsh.
  else
    print This is either very new or very old.
  fi

but sometimes I will show both what you type into a shell interactively, and what the shell throws back at you:

  % print $ZSH_VERSION
  3.1.9
  % print $CPUTYPE
  i586

Here, `%' shows the prompt the shell puts up to tell you it is expecting input (and the space immediately after is part of it). Actually, you probably see something before the percent sign like the name of the machine or your user name, or maybe something fancier. I've pruned it to the minimum to avoid confusion, and kept it as reminder that this is the line you type.

If you're reading an electronic version of this guide, and want to copy lines with the `%' in front into a terminal to be executed, there's a neat way of doing this where you don't even have to edit the line first:

  alias %=' '

Then % at the start of a line is turned into nothing whatsoever; the space just indicates that any following aliases should be expanded. So the line `% print $CPUTYPE' will ignore the `%' and execute the rest of the line. (I hope it's obvious, but your own prompt is always ignored; this is just if you copy the prompts from the guide into the shell.)

There are lots of different types of object in zsh, but one of the most common is parameters, which I will always show with a `$' sign in front, like `$ZSH_VERSION', to remind you they are parameters. You need to remember that when you're setting or fiddling with the parameter itself, rather than its value, you omit the `$'. When you do and don't need it should become clearer as we go along.

The other objects I'll show specially are shell options --- choices about how the shell is to work --- which I write like this: `SH_WORD_SPLIT', `NO_NOMATCH', `ZLE'. Again, that's not the whole story since whenever the shell expects options you can write them in upper or lower case with as many or as few underscores as you like; and often in code chunks I'll use the simplest form instead: `shwordsplit', `nonomatch', `zle'. If you're philosophical you can think of it as expressing the category difference between talking about programming and actual programming, but really it's just me being inconsistent.

You may find it odd that I use three hyphens to signify a dash. That's actually a convention used in the printed version of this guide, which is made with LaTeX. One day, I will turn this into a macro and it will appear properly in other versions; but then, one day the universe will come to an end.

1.4: Acknowledgments

I am grateful for comments from various zsh users. In particular, I have had detailed comments and corrections from Bart Schaefer, Sven `Mr Completion' Wischnowsky and Oliver Kiddle. It's usual to add that any remaining errors are my own, but that's so stark staringly obvious as to be ridiculous. I mean, who wrote this? Never mind.

Most of this written on one or another release of Linux Mandrake (a derivative of Red Hat), with the usual GNU and XFree86 tools. Since all of this was free, it only seems fair to say `thank you' for the gift. It also works a lot better than the operating system that came with this particular PC.

Table of Contents generated with DocToc

Chapter 2: What to put in your startup files

Chapter 2: What to put in your startup files

There are probably various changes you want to make to the shell's behaviour. All shells have `startup' files, containing commands which are executed as soon as the shell starts. Like many others, zsh allows each user to have their own startup files. In this chapter, I discuss the sorts of things you might want to put there. This will serve as an introduction to what the shell does; by the end, you should have an inkling of many of the things which will be discussed in more detail later on and why they are interesting. Sometimes you will find out more than you want to know, such as how zsh differs from other shells you're not going to use. Explaining the differences here saves me having to lie about how the shell works and correcting it later on: most people will simply want to know how the shell normally works, and note that there are other ways of doing it.

First, you need to know what is meant by an interactive and a login shell. Basically, the shell is just there to take a list of commands and run them; it doesn't really care whether the commands are in a file, or typed in at the terminal. In the second case, when you are typing at a prompt and waiting for each command to run, the shell is interactive; in the other case, when the shell is reading commands from a file, it is, consequently, non-interactive. A list of commands used in this second way --- typically by typing something like zsh filename, although there are shortcuts --- is called a script, as if the shell was acting in a play when it read from it (and shells can be real hams when it comes to playacting). When you start up a script from the keyboard, there are actually two zsh's around: the interactive one you're typing at, which is waiting for another, non-interactive one to finish running the script. Almost nothing that happens in the second one affects the first; they are different copies of zsh.

Remember that when I give examples for you to type, I often show them as they would appear in a script, without prompts in front. What you actually see on the screen if you type them in will have a lot more in front.

When you first log into the computer, the shell you are presented with is interactive, but it is also a login shell. If you type `zsh', it starts up a new interactive shell: because you didn't give it the name of a file with commands in, it assumes you are going to type them interactively. Now you've got two interactive shells at once, one waiting for the other: it doesn't sound all that useful, but there are times when you are going to make some radical changes to the shell's settings temporarily, and the easiest thing to do is to start another shell, do what you want to do, and exit back to the original, unaltered, shell --- so it's not as stupid as it sounds.

However, that second shell will not be a login shell. How does zsh know the difference? Well, the programme that logs you in after you type your password (called, predictably, login), actually sticks a `-' in front of the name of the shell, which zsh recognises. The other way of making a shell a login shell is to run it yourself with the option -l; typing `zsh -l' will start a zsh that also thinks it's a login shell, and later I'll explain how to turn on options within the shell, which you can do with the login option too. Otherwise, any zsh you start yourself will not be a login shell. If you are using X-Windows, and have a terminal emulator such as xterm running a shell, that is probably not a login shell. However, it's actually possible to get xterm to start a login shell by giving it the option -ls, so if you type `xterm -ls &', you will get a window running a login shell (the & means the shell in the first window doesn't wait for it to finish).

The first main difference between a login shell and any other interactive shell is the one to do with startup files, described below. The other one is what you do when you're finished. With a login shell you can type `logout' to exit the shell; with another you type `exit'. However, `exit' works for all shells, interactive, non-interactive, login, whatever, so a lot of people just use that. In fact, the only difference is that `logout' will tell you `not login shell' if you use it anywhere else and fail to exit. The command `bye' is identical to `exit', only shorter and less standard. So my advice is just to use `exit'.

As somebody pointed out to me recently, login shells don't have to be interactive. You can always start a shell in the two ways that make it a login shell; the ways that make it an interactive shell or not are independent. In fact, some start-up scripts for windowing systems run a non-interactive login shell to incorporate definitions from the appropriate login scripts before executing the commands to start the windowing session.

Telling if the shell you are looking at is interactive is usually easy: if there's a prompt, it's interactive. As you may have gathered, telling if it's a login shell is more involved because you don't always know how the shell was started or if the option got changed. If you want to know, you can type the following (one line at a time if you like, see below),

  if [[ -o login ]]; then
    print yes
  else
    print no
  fi

which will print `yes' or `no' according to whether it's a login shell or not; the syntax will be explained as we go along. There are shorter ways of doing it, but this illustrates the commonest shell syntax for testing things, something you probably often want to do in a startup file. What you're testing goes inside the `[[ ... ]]'; in this case, the -o tells the shell to test an option, here login. The next line says what to do if the test succeeded; the line after the `else' what to do if the test failed. This syntax is virtually identical to ksh; in this guide, I will not give exhaustive details on the tests you can perform, since there are many of them, but just show some of the most useful. As always, see the manual --- in this case, `Conditional Expressions' in the zshmisc manual pages.

Although you usually know when a shell is interactive, in fact you can test that in exactly the same way, too: just use `[[ -o interactive ]]'. This is one option you can't change within the shell; if you turn off reading from the keyboard, where is the shell supposed to read from? But you can at least test it.

Aside for beginners in shell programming: maybe the semicolon looks a bit funny; that's because the `then' is really a separate command. The semicolon is just instead of putting it on a new line; the two are interchangeable. In fact, I could have written,

  if [[ -o login ]]; then; print yes; else; print no; fi

which does exactly the same thing. I could even have missed out the semicolons after `then' and `else', because the shell knows that a command must come after each of those --- though the semicolon or newline before the then is often important, because the shell does not know a command has to come next, and might mix up the then with the arguments of the command after the `if': it may look odd, but the `[[ ... ]]' is actually a command. So you will see various ways of dividing up the lines in shell programmes. You might also like to know that print is one of the builtin commands referred to before; in other words, the whole of that chunk of programme is executed by the shell itself. If you're using a newish version of the shell, you will notice that zsh tells you what it's waiting for, i.e. a `then' or an `else' clause --- see the explanation of $PS2 below for more on this. Finally, the spaces I put before the `print' commands were simply to make it look prettier; any number of spaces can appear before, after, or between commands and arguments, as long as there's at least one between ordinary words (the semicolon is recognised as special, so you don't need one before that, though it's harmless if you do put one in).

Second aside for users of sh: you may remember that tests in sh used a single pair of brackets, `if [ ... ]; then ...', or equivalently as a command called test, `if test ...; then ...'. The Korn shell was deliberately made to be different, and zsh follows that. The reason is that `[[' is treated specially, which allows the shell to do some extra checks and allows more natural syntax. For example, you may know that in sh it's dangerous to test a parameter which may be empty: `[ $var = foo ]' will fail if $var is empty, because in that case the word is missed out and the shell never knows it was supposed to be there; with `[[ ... ]]', this is quite safe because the shell is aware there's a word before the `=', even if it's empty. Also, you can use `&&' and `||' to mean logical `and' and `or', which agrees with the usual UNIX/C convention; in sh, they would have been taken as starting a new command, not as part of the test, and you have to use the less clear `-a' and `-o'. Actually, zsh provides the old form of test for backward compatibility, but things will work a lot more smoothly if you don't use it.

2.2: All the startup files

Now here's a list of the startup files and when they're run. You'll see they fall into two classes: those in the /etc directory, which are put there by the system administrator and are run for all users, and those in your home directory, which zsh, like many shells, allows you to abbreviate to a `~'. It's possible that the latter files are somewhere else; type `print $ZDOTDIR' and if you get something other than a blank line, or an error message telling you the parameter isn't set, it's telling you a directory other than `~' where your startup files live. If $ZDOTDIR (another parameter) is not already set, you won't want to set it without a good reason.

/etc/zshenv
Always run for every zsh.
~/.zshenv
Usually run for every zsh (see below).
/etc/zprofile
Run for login shells.
~/.zprofile
Run for login shells.
/etc/zshrc
Run for interactive shells.
~/.zshrc
Run for interactive shells.
/etc/zlogin
Run for login shells.
~/.zlogin
Run for login shells.

Now you know what login and interactive shells are, this should be straightforward. You may wonder why there are both ~/.zprofile and ~/.zlogin, when they are both for login shells: the answer is the obvious one, that one is run before, one after ~/.zshrc. This is historical; Bourne-type shells run /etc/profile, and csh-type shells run ~/.login, and zsh tries to cover the bases with its own startup files.

The complication is hinted at by the `see below'. The file /etc/zshenv, as it says, is always run at the start of any zsh. However, if the option NO_RCS is set (or, equivalently, the RCS option is unset: I'll talk about options shortly, since they are important in startup files), none of the others are run. The most common way of setting this option is with a flag on the command line: if you start the shell as `zsh -f', the option becomes set, so only /etc/zshenv is run and the others are skipped. Often, scripts do this as a way of trying to get a basic shell with no frills, as I'll describe below; but if something is set in /etc/zshenv, there's no way to avoid it. This leads to the First Law of Zsh Administration: put as little as possible in the file /etc/zshenv, as every single zsh which starts up has to read it. In particular, if the script assumes that only the basic options are set and /etc/zshenv has altered them, it might well not work. So, at the absolute least, you should probably surround any option settings in /etc/zshenv with

  if [[ ! -o norcs ]]; then
    ... <commands to run if NO_RCS is not set, 
         such as setting options> ...
  fi

and your users will be eternally grateful. Settings for interactive shells, such as prompts, have no business in /etc/zshenv unless you really insist that all users have them as defaults for every single shell. Script writers who want to get round problems with options being changed in /etc/zshenv should put `emulate zsh' at the top of the script.

There are two files run at the end: ~/.zlogout and /etc/zlogout, in that order. As their names suggest, they are counterparts of the zlogin files, and therefore are only run for login shells --- though you can trick the shell by setting the login option. Note that whether you use exit, bye or logout to leave the shell does not affect whether these files are run: I wasn't lying (this time) when I said that the error message was the only difference between exit and logout. If you want to run a file at the end of any other type of shell, you can do it another way:

  TRAPEXIT() {
    # commands to run here, e.g. if you 
    # always want to run .zlogout:
    if [[ ! -o login ]]; then
      # don't do this in a login shell
      # because it happens anyway
      . ~/.zlogout
    fi
  }

If you put that in .zshrc, it will force .zlogout to be run at the end of all interactive shells. Traps will be mentioned later, but this is rather a one-off; it's really just a hack to get commands run at the end of the shell. I won't talk about logout files, however, since there's little that's standard to put in them; some people make them clear the screen to remove sensitive information with the `clear' command. Other than that, you might need to tidy a few files up when you exit.

2.3: Options

It's time to talk about options, since I've mentioned them several times. Each option describes one particular shell behaviour; they are all Boolean, i.e. can either be on or off, with no other state. They have short names and in the documentation and this guide they are written in uppercase with underscores separating the bits (except in actual code, where I'll write them in the short form). However, neither of those is necessary. In fact, NO_RCS and norcs and __N_o_R_c_S__ mean the same thing and are all accepted by the shell.

The second thing is that an option with `no' in front just means the opposite of the option without. I could also have written the test `[[ ! -o norcs ]]' as `[[ -o rcs ]]'; the `!' means `not', as in C. You can only have one `no'; `nonorcs' is meaningless. Unfortunately, there is an option `NOMATCH' which has `no' as part of its basic name, so in this case the opposite really is `NO_NOMATCH'; NOTIFY, of course, is also a full name in its own right.

The usual way to set and unset options is with the commands setopt and unsetopt which take a string of option names. Some options also have flags, like the `-f' for NO_RCS, which these commands also accept, but it's much clearer to use the full name and the extra time and space is negligible. The command `set -o' is equivalent to setopt; this comes from ksh. Note that set with no `-o' does something else --- that sets the positional parameters, which is zsh's way of passing arguments to scripts and functions.

Almost everybody sets some options in their startup files. Since you want them in every interactive shell, at the least, the choice is between putting them in ~/.zshrc or ~/.zshenv. The choice really depends on how you use non-interactive shells. They can be started up in unexpected places. For example, if you use Emacs and run commands from inside it, such as grep, that will start a non-interactive shell, and may require some options. My rule of thumb is to put as many options as possible into ~/.zshrc, and transfer them to ~/.zshenv if I find I need them there. Some purists object to setting options in ~/.zshenv at all, since it affects scripts; but, as I've already hinted, you have to work a bit harder to make sure scripts are unaffected by that sort of thing anyway. In the following, I just assume they are going to be in ~/.zshrc.

2.4: Parameters

One more thing you'll need to know about in order to write startup files is parameters, also known as variables. These are mostly like variables in other programming languages. Simple parameters can be stored like this (an assignment):

  foo='This is a parameter.'

Note two things: first, there are no spaces around the `='. If there was a space before, zsh would think `foo' was the name of a command to execute; if there was a space after it, it would assign an empty string to the parameter foo. Second, note the use of quotes to stop the spaces inside the string having the same effect. Single quotes, as here, are the nuclear option of quotes: everything up to another single quote is treated as a simple string --- newlines, equal signs, unprintable characters, the lot, in this example all would be assigned to the variable; for example,

  foo='This is a parameter.
  This is still the same parameter.'

So they're the best thing to use until you know what you're doing with double quotes, which have extra effects. Sometimes you don't need them, for example,

  foo=oneword

because there's nothing in `oneword' to confuse the shell; but you could still put quotes there anyway.

Users of csh should note that you don't use `set' to set parameters. This is important because there is a set command, but it works differently --- if you try `set var="this wont't work"', you won't get an error but you won't set the parameter, either. Type `print $1' to see what you did set instead.

To get back what was stored in a parameter, you use the name somewhere on the command line with a `$' tacked on the front --- this is called an expansion, or to be more precise, since there are other types of expansion, a parameter expansion. For example, after the first assignment above.

  print -- '$foo is "'$foo'"'

gives

  $foo is "This is a parameter."

so you can see what I meant about the effect of single quotes. Note the asymmetry --- there is no `$' when assigning the parameter, but there is a `$' in front to have it expanded it into the command line. You may find the word `substitution' used instead of `expansion' sometimes; I'll try and stick with the terminology in the manual.

Two more things while we're at it. First, why did I put `-``-' after the print? That's because print, like many UNIX commands, can take options after it which begin with a `-'. `-``-' says that there are no more options; so if what you're trying to print begins with a `-', it will still print out. Actually, in this case you can see it doesn't, so you're safe; but it's a good habit to get into, and I wish I had. As always in zsh, there are exceptions; for example, if you use the -R option to print before the `-``-', it only recognizes BSD-style options, which means it doesn't understand `-``-'. Indeed, zsh programmers can be quite lax about standards and often use the old, but now non-standard, single `-' to show there are no more options. Currently, this works even after -R.

The next point is that I didn't put spaces between the single quotes and the $foo and it was still expanded --- expansion happens anywhere the parameter is not quoted; it doesn't have to be on its own, just separated from anything which might make it look like a different parameter. This is one of those things that can help make shell scripts look so barbaric.

As well as defining your own parameters, there are also a number which the shell sets itself, and some others which have a special effect when you set them. All the above still applies, though. For the rest of this guide, I will indicate parameters with the `$' stuck in front, to remind you what they are, but you should remember that the `$' is missing when you set them, or, indeed, any time when you're referring to the name of the parameter instead of its value.

2.4.1: Arrays

There is a special type of parameter called an array which zsh inherited from both ksh and csh. This is a slightly shaky marriage, since some of the things those two shells do with them are not compatible, and zsh has elements of both, so you need to be careful if you've used arrays in either. The option KSH_ARRAYS is something you can set to make them behave more like they do in ksh, but a lot of zsh users write functions and scripts assuming it isn't set, so it can be dangerous.

Unlike normal parameters (known as scalars), arrays have more than one word in them. In the examples above, we made the parameter $foo get a string with spaces in, but the spaces weren't significant. If we'd done

  foo=(This is a parameter.)

(note the absence of quotes), it would have created an array. Again, there must be no space between the `=' and the `(', though inside the parentheses spaces separate words just like they do on a command line. The difference isn't obvious if you try and print it --- it looks just the same --- but now try this:

  print -- ${foo[4]}

and you get `parameter.'. The array stores the words separately, and you can retrieve them separately by putting the number of the element of the array in square brackets. Note also the braces `{...}' --- zsh doesn't always require them, but they make things much clearer when things get complicated, and it's never wrong to put them in: you could have said `${foo}' when you wanted to print out the complete parameter, and it would be treated identically to `$foo'. The braces simply screen off the expansion from whatever else might be lying around to confuse the shell. It's useful too in expressions like `${foo}s' to keep the `s' from being part of the parameter name; and, finally, with KSH_ARRAYS set, the braces are compulsory, though unfortunately arrays are indexed from 0 in that case.

You can use quotes when defining arrays; as before, this protects against the shell thinking the spaces are between different elements of the array. Try:

  foo=('first element' 'second element')
  print -- ${foo[2]}

Arrays are useful when the shell needs to keep a whole series of different things together, so we'll meet some you may want to put in a startup file. Users of ksh will have noticed that things are a bit different in zsh, but for now I'll just assume you're using the normal zsh way of doing things.

2.5: What to put in your startup files

At the last count there were over 130 options and several dozen parameters which are special to the shell, and many of them deal with things I won't talk about till much later. But as a guide to get you started, and an indication of what's to come, here are some options and parameters you might want to think about setting in ~/.zshrc.

2.5.1: Compatibility options: `SH_WORD_SPLIT` and others

I've already mentioned that zsh works differently from ksh, its nearest standard relative, and that some of these differences can be confusing to new users, for example the use of arrays. Some options like KSH_ARRAYS exist to allow you to have things work the ksh way. Most of these are fairly finnicky, but one catches out a lot of people. Above, I said that after

  foo='This is a parameter.'

then $foo would be treated as one word. In traditional Bourne-like shells including sh, ksh and bash, however, the shell will split $foo on any spaces it finds. So if you run a command

  command $foo

then in zsh the command gets a single argument `This is a parameter.', but in the other shells it gets the first argument `This', the second argument `is', and so on. If you like this, or are so used to it it would be confusing to change, you should set the option SH_WORD_SPLIT in your ~/.zshrc. Most experienced zsh users use arrays when they want word splitting, since as I explained you have control over what is split and what is not; that's why SH_WORD_SPLIT is not set by default. Users of other shells just get used to putting things in double quotes,

  command "$foo"

which, unlike single quotes, allow the `$' to remain special, and have the side effect that whatever is in quotes will remain a single word (though there's an exception to that, too: the parameter $@).

There are a lot of other options doing similar things to keep users of standard shells happy. Many of them simply turn features off, because the other shell doesn't have them and hence unexpected things might happen, or simply tweak a feature which is a little different or doesn't usually matter. Currently such options include NO_BANG_HIST, BSD_ECHO (sh only), IGNORE_BRACES, INTERACTIVE_COMMENTS, KSH_OPTION_PRINT, NO_MULTIOS, POSIX_BUILTINS, PROMPT_BANG, SINGLE_LINE_ZLE (I've written them how they would appear as an argument to setopt to put the option the way the other shell expects, so some have `NO_' in front). Most people probably won't change those unless they notice something isn't working how they expect.

Some others have more noticeable effects. Here are a few of the ones most likely to make you scratch your head if you're changing from another Bourne-like shell.

BARE_GLOB_QUAL, GLOB_SUBST, SH_FILE_EXPANSION, SH_GLOB, KSH_GLOB

These are all to do with how pattern matching works. You probably already know that the pattern `*.c' will be expanded into all the files in the current directory ending in `.c'. Simple uses like this are the same in all shells, and the way filenames are expanded is often referred to as `globbing' for historical reasons (apparently it stood for `global replacement'), hence the name of some of these options.

However, zsh and ksh differ over more complicated patterns. For example, to match either file foo.c or file bar.c, in ksh you would say @(foo|bar).c. The usual zsh way of doing things is (foo|bar).c. To turn on the ksh way of doing things, set the option KSH_GLOB; to turn off the zsh way, set the options SH_GLOB and NO_BARE_GLOB_QUAL. The last of those turns off qualifiers, a very powerful way of selecting files by type (for example, directories or executable files) instead of by name which I'll talk about in chapter 5.

The other two need a bit more explanation. Try this:

  foo='*'
  print $foo

In zsh, you usually get a `*' printed, while in ksh the `*' is expanded to all the files in the directory, just as if you had typed `print *'. This is a little like SH_WORD_SPLIT, in that ksh is pretending that the value of $foo appears on the command line just as if you typed it, while zsh is using what you assigned to foo without allowing it to be changed any more. To allow the word to be expanded in zsh, too, you can set the option GLOB_SUBST. As with SH_WORD_SPLIT, the way around the ksh behaviour if you don't want the value changed is to use double quotes: "$foo".

You are less likely to have to worry about SH_FILE_EXPANSION. It determines when the shell expands things like ~/.zshrc to the full path, e.g. /home/user2/pws/.zshrc. In the case of zsh, this is usually done quite late, after most other forms of expansion such as parameter expansion. That means if you set GLOB_SUBST and do

  foo='~/.zshrc'
  print $foo

you would normally see the full path, starting with a `/'. If you also set SH_FILE_EXPANSION, however, the `~' is tested much earlier, before $foo is replaced when there isn't one yet, so that `~/.zshrc' would be printed. This (with both options) is the way ksh works. It also means I lied when I said ksh treats $foo exactly as if its value had been typed, because if you type print ~/.zshrc the `~' does get expanded. So you see how convenient lying is.

NOMATCH, BAD_PATTERN

These also relate to patterns which produce file names, but in this case they determine what happens when the pattern doesn't match a file for some reason. There are two possible reasons: either no file happened to match, or you didn't use a proper pattern. In both cases, zsh, unlike ksh, prints an error message. For example,

  % print nosuchfile*
  zsh: no matches found: nosuchfile*
  % print [-
  zsh: bad pattern: [-

(Remember the `%' lines are what you type, with a prompt in front which comes from the shell.) You can see there are two different error messages: you can stop the first by setting NO_NOMATCH, and the second by setting NO_BAD_PATTERN. In both cases, that makes the shell print out what you originally type without any expansion when there are no matching files.

BG_NICE, NOTIFY

All UNIX shells allow you to start a background job by putting `&' at the end of the line; then the shell doesn't wait for the job to finish, so you can type something else. In zsh, such jobs are usually run at a lower priority (a `higher nice value' in UNIX-speak), so that they don't use so much of the processor's time as foreground jobs (all the others, without the `&') do. This is so that jobs like editing or using the shell don't get slowed down, which can be highly annoying. You can turn this feature off by setting NO_BG_NICE.

When a background job finishes, zsh usually tells you immediately by printing a message, which interrupts whatever you're doing. You can stop this by setting NO_NOTIFY. Actually, this is an option in most versions of ksh, too, but it's a little less annoying in zsh because if it happens while you're typing something else to the shell, the shell will reprint the line you were on as far as you've got. For example:

  % sleep 3 &
  [1] 40366
  % print The quick brown
  [1]  + 40366 done       sleep 3
  % print The quick brown

The command sleep simply does nothing for however many seconds you tell it, but here it did it in the background (zsh printed a message to tell you). After you typed for three seconds, the job exited, and with NOTIFY set it printed out another message: the `done' is the key thing, as it tells you the job has finished. But zsh was smart enough to know the display was messed up, so it reprinted the line you were editing, and you can continue. If you were already running another programme in the foreground, however, that wouldn't know that zsh had printed the message, so the display would still be messed up.

HUP

Signals are the way of persuading a job to do something it doesn't want to, such as die; when you type ^C, it sends a signal (called SIGINT in this case) to the job. In zsh, if you have a background job running when the shell exits, the shell will assume you want that to be killed; in this case it is sent a particular signal called `SIGHUP' which stands for `hangup' (as in telephone, not as in Woody Allen) and is the UNIX equivalent of `time to go home'. If you often start jobs that should go on even when the shell has exited, then you can set the option NO_HUP, and background jobs will be left alone.

KSH_ARRAYS

I've already mentioned this, but here are the details. Suppose you have defined an array arr, for example with

  arr=(foo bar)

although the syntax in ksh, which zsh also allows, is

  set -A arr foo bar

In zsh, $arr gives out the whole array; in ksh it just produces the first element. In zsh, ${arr[1]} refers to the first element of the array, i.e. foo, while in ksh the first element is referred to as ${arr[0]} so that ${arr[1]} gives you bar. Finally, in zsh you can get away with $arr[1] to refer to an element, while ksh insists on the braces. By setting KSH_ARRAYS, zsh will switch to the ksh way of doing things. This is one option you need to be particularly careful about when writing functions and scripts.

FUNCTION_ARG_ZERO

Shell functions are a useful way of specifying a set of commands to be run by the shell. Here's a simple example:

  % fn() { print My name is $0; }
  % fn
  My name is fn

Note the special syntax: the `()' appears after a function name to say you are defining one, then a set of commands appears between the `{ ... }'. When you type the name of the function, those commands are executed. If you know the programming language C, the syntax will be pretty familiar, although note that the `()' is a bit of a delusion: you might think you would put arguments to the function in there, but you can't, it must always appear simply as `()'. If you don't know C, it doesn't matter; nothing from C really applies in detail, it's just a superficial resemblance.

In this case, zsh printed the special parameter `$0' (`argument zero') and, as you see, that turned into the name of the function. Now $0 outside a function means the name of the shell, or the name of the script for a non-interactive shell, so if you type `print $0' it will probably say `zsh'. In most versions of ksh, this is $0's only use; it doesn't change in functions, and `fn' would print `ksh'. To get this behaviour, you can set NO_FUNCTION_ARG_ZERO. There's probably no reason why you would want to, but zsh functions quite often test their own name, so this is one reason why they might not work.

There's another difference when defining functions, irrespective of FUNCTION_ARG_ZERO: in zsh, you can get away without the final `;' before the end of the definition of fn, because it knows the `}' must finish the last command as well as the function; but ksh is not so forgiving here. Lots of syntactic know-alls will probably be able to tell you why that's a good thing, but fortunately I can't.

KSH_AUTOLOAD

There's an easy way of loading functions built into both ksh and zsh. Instead of putting them all together in a big startup file, you can put a single line in that,

  autoload fn

and the function `fn' will only be loaded when you run it by typing its name as a command. The shell needs to know where the function is stored. This is done by a special parameter called $fpath, an array which is a list of directories; it will search all the directories for a file called fn, and use that as the function definition. If you want to try this you can type `autoload fn; fpath=(. $fpath)' and write a file called fn in the current directory.

Unfortunately ksh and zsh disagree a bit about what should be in that file. The normal zsh way of doing things is just putting the body of the function there. So if the file fn is autoloadable and contains,

  # this is a simple function
  print My name is $0

then typing `fn' will have exactly the same effect as the function fn above, printing `My name is fn'. Zsh users tend to like this because the function is written the same way as a script; if instead you had typed zsh fn, to call the file as a script with a new copy of zsh of its own, it would have worked the same way. The first line is a comment; it's ignored, and in zsh not even autoloaded when the function is run, so it's not only much clearer to add explanatory contents, it also doesn't use any more memory either. It uses more disk space, of course, but nowadays even home PCs come with the sort of disk size which allows you a little indulgence with legibility.

However, ksh does things differently, and here the file fn needs to contain

  fn() {
    # this is a simple function
    print My name is $0
  }

in other words, exactly what you would type to define the function. The advantage of this form is that you can put other things in the file, which will then be run straight away and forgotten about, such as defining things that fn may need to use but which don't need to be redefined every single time you run fn. The option to force zsh to work the ksh way here is called KSH_AUTOLOAD. (If you wanted to try the second example, you would need to type `unfunction fn; autoload fn' to remove the function from memory and mark it for autoloading again.)

Actually, zsh is a little bit cleverer. If the option KSH_AUTOLOAD is not set, but the file contains just a function definition in the ksh form and nothing else (like the last one above, in fact), then zsh assumes that it needs to run the function just loaded straight away. The other possibility would be that you wanted to define a function which did nothing other than define a function of the same name, which is assumed to be unlikely --- and if you really want to do that, you will need to trick zsh by putting a do-nothing command in the same file, such as a `:' on the last line.

A final complication --- sorry, but this one actually happens --- is that sometimes in zsh you want to define not just the function to be called, but some others to help it along. Then you need to do this:

  fn() {
    # this is the function after which the file is named
  }
  helper() {
    # goodness knows what this does
  }
  fn "$@"
  # this actually calls the function the first time,
  # with any arguments passed (see the subsection
  # `Function Parameters' in the section `Functions'
  # of the next chapter for the "$@").

That last non-comment line is unnecessary with KSH_AUTOLOAD. The functions supplied with zsh assume that KSH_AUTOLOAD is not set, however, so you shouldn't turn it on unless you need to. You could just make fn into the whole body, as usual, and define helper inside that; the problem is that helper would be redefined each time you executed fn, which is inefficient. A better way of avoiding the problem would be to define helper as a completely separate function, itself autoloaded: in both zsh and ksh, it makes no difference whether a function is defined inside another function or outside it, unlike (say) Pascal or Scheme.

LOCAL_OPTIONS, LOCAL_TRAPS

These two options also refer to functions, and here the ksh way of doing things is usually preferable, so many people set at least LOCAL_OPTIONS in a lot of their functions. The first versions of zsh didn't have these, which is why you need to turn them on by hand.

If LOCAL_OPTIONS is set in a function (or was already set before the function, and not unset inside it), then any options which are changed inside the function will be put back the way they were when the function finishes. So

  fn() {
    setopt localoptions kshglob
    ...
  }

allows you to use a function with the ksh globbing syntax, but will make sure that the option KSH_GLOB is restored to whatever it was before when the function exits. This works even if the function was interrupted by typing ^C. Note that LOCAL_OPTIONS will itself be restored to the way it was.

The option LOCAL_TRAPS, which first appeared in version 3.1.6, is for a similar reason but refers to (guess what) traps, which are a way of stopping signals sent to the shell, for example by typing ^C to cancel something (SIGINT, short for `signal interrupt'), or ^Z to suspend it temporarily (SIGTSTP, `signal terminal stop'), or SIGHUP which we've already met, and so on. To do something of your own when the shell gets a ^C, you can do

  trap 'print I caught a SIGINT' INT

and the set of commands in quotes will be run when the ^C arrives (you can even try it without running anything). If the string is empty (just '``' with nothing inside), the signal will be ignored; typing ^C has no effect. To put it back to normal, the command is `trap - INT'.

Traps are most useful in functions, where you may temporarily (say) not want things to stop when you hit ^C, or you may want to clear up something before returning from the function. So now you can guess what LOCAL_TRAPS does; with

  fn() {
    setopt localoptions localtraps
    trap '' INT
    ...
  }

the shell will ignore ^C's to the end of the function, but then put back the trap that was there before, or remove it completely if there was none. Traps are described in more detail in chapter 3.

There is a very convenient shorthand for making options and traps local, as well as for setting the others to their standard values: put `emulate -L zsh' at the start of a function. This sets the option values back to the ones set when zsh starts, but with LOCAL_OPTIONS and LOCAL_TRAPS set too, so you now know exactly how things are going to work for the rest of the function, whatever options are set in the outside world. In fact, this only changes the options which affect normal programming; you can set every option which it makes sense to set to its standard value with `emulate -RL zsh' (it doesn't, for example, make sense to change options like login at this point). Furthermore, you can make the shell behave as much like ksh as it knows how to by doing `emulate -L ksh', with or without the -R.

The -L option to emulate actually only appears in versions from 3.0.6 and 3.1.6. Before that you needed

  emulate zsh
  setopt localoptions

since localtraps didn't exist, and indeed doesn't exist in 3.0.6 either.

PROMPT_PERCENT, PROMPT_SUBST

As promised, setting prompts will be discussed later, but for now there are two ways of getting information into prompts, such as the parameter $PS1 which determines the usual prompt at the start of a new command line. One is by using percent escapes, which means a `%' followed by another character, maybe with a number between the two. For example, the default zsh prompt is `%m%# '. The first percent escape turns into the name of the host computer, the second usually turns into a `%', but a `#' for the superuser. However, ksh doesn't have these, so you can turn them off by setting NO_PROMPT_PERCENT.

The usual ksh way of doing things, on the other hand, is by putting parameters in the prompt to be substituted. To get zsh to do this, you have to set PROMPT_SUBST. Then assigning

  PS1='${PWD}% '

is another way of putting the name of the current directory (`$PWD' is presumably named after the command `pwd' to `print working directory') into the prompt. Note the single quotes, so that this happens when the prompt is shown, not when it is assigned. If they weren't there, or were double quotes, then the $PWD would be expanded to the directory when the assignment took place, probably your home directory, and wouldn't change to reflect the directory you were actually in. Of course, you need the quotes for the space, too, else it just gets swallowed up when the assignment is executed.

As there is potentially much more information available in parameters than the fixed number of predefined percent escapes, you may wish to set PROMPT_SUBST anyway. Furthermore, you can get the output of commands into prompts since other forms of expansion are done on them, not just that of parameters; in fact, prompts with PROMPT_SUBST are expanded pretty much the same as a string inside double quotes every time the prompt is displayed.

RM_STAR_SILENT

Everybody at some time or another deletes more files than they mean to (and that's a gross understatement); my favourite is:

  rm *>o

That `>' should be a `.', but I still had the shift key pressed. This removes all files, echoing the output (there isn't any) into a file `o'. Delightfully, the empty file `o' is not removed. (Don't try this at home.)

There is a protection mechanism built into zsh to stop you deleting all the files in a directory by accident. If zsh finds that the command is `rm', and there is a `*' on the command line (there may be other stuff as well), then it will ask you if you really want to delete all those files. You can turn this off by setting RM_STAR_SILENT. Overreliance on this option is a bad idea; it's only a last line of defence.

SH_OPTION_LETTERS

Many options also have single letters to stand for them; you can set an option in this way by, for example, `set -f', which sets NO_RCS. However, even where sh, ksh and zsh share options, not all have the same letters. This option allows the single letter options to be more like those in sh and ksh. Look them up in the manual if you want to know, but I already recommended that you use the full names for options anyway.

SH_WORD_SPLIT

I've already talked about this, see above, but it's mentioned here so you don't forget it, since it's an important difference.

Starting zsh as ksh

Finally on the subject of compatibility, you might like to know that as well as `emulate' there is another way of forcing zsh to behave as much like sh or ksh as possible. This is by actually calling zsh under the name ksh. You don't need to rename zsh, you can make a link from the name zsh to the name ksh, which will be enough to convince it.

There is an easier way when you are doing this from within zsh itself. The parameter $ARGV0 is special; it is the value which will be passed as the first argument of a command which is run by the shell. Normally this is the name of the command, but it doesn't have to be since the command only finds out what it is after it has already been run. You can use it to trick a programme into thinking its name is different. So

  ARGV0=ksh zsh

will start a copy of zsh that tries to make itself like ksh. Note this doesn't work unless you're already in zsh, as the $ARGV0 won't be special.

I haven't mentioned putting a parameter assignment before a command name, but that simply assigns the parameter (strictly an environment variable in this case) for the duration of the command; the value $ARGV0 won't be set after that command (the ksh-like zsh) finishes, as you can easily test with print. While I'm here, I should mention a few of its other features. First, the parameter is automatically exported to the environment, meaning it's available for other programmes started by zsh (including, in this case, the new zsh) --- see the section on environment variables below. Second, this doesn't do what you might expect:

  FOO=bar print $FOO

because of the order of expansion: the command line and its parameters are expanded before execution, giving whatever value $FOO had before, probably none, then FOO=bar is put into the environment, and then the command is executed but doesn't use the new value of $FOO.

2.5.2: Options for csh junkies

As well as old ksh users, there are some options available to make old csh and tcsh users feel more at home. As you will already have noticed, the syntax is very different, so you are never going to feel completely at home and it might be best just to remember the fact. But here is a brief list. The last, CSH_NULL_GLOB, is actually quite useful.

CSH_JUNKIE_HISTORY

Zsh has the old csh mechanism for referring to words on a previous command line using a `!'; it's less used, now the editor is more powerful, but is still a convenient shorthand for extracting short bits from the previous line. This mechanism is sometimes called bang-history, since busy people sometimes like to say `!' as `bang'. This option affects how a single `!' works. For example,

  % print foo bar
  % print open closed
  % print !-2:1 !:2

In the last line, `!-2' means two entries ago, i.e. the line `print foo bar'. The `:1' chooses the first word after the command, i.e. `foo'. In the second expression, no number is given after the `!'. Usually zsh interprets that to mean that the same item just selected, in this case -2, should be used. With CSH_JUNKIE_HISTORY set, it refers instead to the last command. Note that if you hadn't given that -2, it would refer to the last command in any case, although the explicit way of referring to the last command is `!!' --- you have to use that if there are no `:' bits following. In summary, zsh usually gives you `print foo bar'; with CSH_JUNKIE_HISTORY you get `print foo closed'.

There's another option controlling this, BANG_HIST. If you unset that, the mechanism won't work at all. There's also a parameter, $histchars. The first character is the main history expansion character, normally `!' of course; the second is for rapid substitutions (normally `^' --- use of this is described below); the third is the character introducing comments, normally `#'. Changing the third character is definitely not recommended. There's little real reason to change any.

CSH_JUNKIE_LOOPS

Normal zsh loops look something like this,

  while true; do
    print Never-ending story
  done

which just prints the message over and over (type it line-by-line at the prompt, if you like, then ^C to stop it). With CSH_JUNKIE_LOOPS set, you can instead do

  while true
    print Never-ending story
  end

which will, of course, make your zsh code unlike most other people's, so for most users it's best to learn the proper syntax.

CSH_NULL_GLOB

This is another of the family of options like NO_NOMATCH, already mentioned. In this case, if you have a command line consisting of a set of patterns, at least one of them must match at least one file, or an error is caused; any that don't match are removed from the command line. The default is that all of them have to match. There is one final member of this set of options, NULL_GLOB: all non-matching patterns are removed from the command line, no error is caused. As a summary, suppose you enter the command `print file1* file2*' and the directory contains just the file file1.c.

By default, there must be files matching both patterns, so an error is reported.
With NO_NOMATCH set, any patterns which don't match are left alone, so `file1.c file2*' is printed.
With CSH_NULL_GLOB set, file1* matched, so file2* is silently removed; `file1.c' is reported. If that had not been there, an error would have been reported.
With NULL_GLOB set, any patterns which don't match are removed, so again `file1.c' is printed, but in this case if that had not been there a blank line would have been printed, with no error.

CSH_NULL_GLOB is good thing to have set since it can keep you on the straight and narrow without too many unwanted error messages, so this time it's not just for csh junkies.

CSH_JUNKIE_QUOTES

Here just for completeness. Csh and friends don't allow multiline quotes, as zsh does; if you don't finish a pair of quotes before a new line, csh will complain. This option makes zsh do the same. But multi-line quotes are very useful and very common in zsh scripts and functions; this is only for people whose minds have been really screwed up by using csh.

2.5.3: The history mechanism: types of history

The name `history mechanism' refers to the fact that zsh keeps a `history' of the commands you have typed. There are three ways of getting these back; all these use the same set of command lines, but the mechanisms for getting at them are rather different. For some reason, items in the history list (a complete line of input typed and executed at once) have become known as `events'.

Editing the history directly

First, you can use the editor; usually hitting up-arrow will take you to the previous line, and down-arrow takes you back. This is usually the easiest way, since you can see exactly what you're doing. I will say a great deal more about the editor in chapter 4; the first thing to know is that its basic commands work either like emacs, or like vi, so if you know one of those, you can start editing lines straight away. The shell tries to guess whether to use emacs or vi from the environment variables $VISUAL or $EDITOR, in that order; these traditionally hold the name of your preferred editor for programmes which need you to edit text. In the old days, $VISUAL was a full-screen editor and $EDITOR a line editor, like ed of blessed memory, but the distinction is now very blurred. If either contains the string vi, the line editor will start in vi mode, else it will start in emacs mode. If you're in the wrong mode, `bindkey -e' in ~/.zshrc takes you to emacs mode and `bindkey -v' to vi mode. For vi users, the thing to remember is that you start in insert mode, so type `ESC' to be able to enter vi commands.

`Bang'-history

Second, you can use the csh-style `bang-history' mechanism (unless you have set the option NO_BANG_HIST); the `bang' is the exclamation mark, `!', also known as `pling' or `shriek' (or factorial, but that's another story). Thus `!!' retrieves the last command line and executes it; `!-2' retrieves the second last. You can select words: `!!:1' picks the first word after the command of the last command (if you were paying attention above, you will note you just need one `!' in that case); 0 after colon would pick the command word itself; `*' picks all arguments after the command; `$' picks the last word. You can even have ranges: `!!:1-3' picks those three words, and things like `!!:3-$' work too.

After the word selector, you can have a second set of colons and then some special commands called modifiers --- these can be very useful to remember, since they can be applied to parameters and file patterns to, so here's some more details. The `:t' (tail) modifier picks the last part of a filename, everything after the last slash; conversely, `:h' (head) picks everything before that. So with a history entry,

  % print /usr/bin/cat
  /usr/bin/cat
  % print !!:t
  print cat
  cat

Note two things: first, the bang-history mechanism always prints what it's about to execute. Secondly, you don't need the word selector; the shell can tell that the `:t' is a modifier, and assumes you want it applied to the entire previous command. (Be careful here, since actually the :t will reduce the expression to everything after the last slash in any word, which is a little unexpected.)

With parameters:

  % foo=/usr/bin/cat
  % print ${foo:h}
  /usr/bin

(you can usually omit the `{' and `}', but it's clearer and safer with them). And finally with files --- this won't work if you set NO_BARE_GLOB_QUAL for sh-like behaviour:

  % print /usr/bin/cat(:t)
  cat

where you need the parentheses to tell the shell the `:t' isn't just part of the file name.

For a complete list, see the zshexpn manual, or the section Modifiers in the printed or Info versions of the manual, but here are a few more of the most useful. `:r' removes the suffix of a file, turning file.c into file; `:l' and `:u' make the word(s) all lowercase or all uppercase; `:s/foo/bar/' substitutes the first occurrence of foo with bar in the word(s); `:gs/foo/bar' substitutes all occurrences (the `g' stands for global); `:&' repeats the last such substitution, even if you did it on a previous line; `:g&' also works. So

  % print this is this line
  this is this line
  % !!:s/this/that/
  print that is this line
  that is this line
  % print this is no longer this line
  this is no longer this line
  % !!:g&
  print that is no longer that line
  that is no longer that line

Finally, there is a shortcut: ^old^new^ is exactly equivalent to !!:s/old/new/; you can even put another modifier after it. The `^' is actually the second character of $histchars mentioned above. You can miss out the last `^' if there's nothing else to follow it. By the way, you can put modifiers together, but each one needs the colon with it: :t:r applied to `dir/file.c' produces `file', and repeated applications of :h get you shorter and shorter paths.

Before we leave bang-history, note the option HIST_VERIFY. If that's set, then after a substitution the line appears again with the changes, instead of being immediately printed and executed. As you just have to type <RET> to execute it, this is a useful trick to save you executing the wrong thing, which can easily happen with complicated bang-history lines; I have this set myself.

And one last tip: the shell's expansion and completion, which I will enthuse about at length later on, allows you to expand bang-history references straight away by hitting TAB immediately after you've typed the complete reference, and you can usually type control together with slash (on some keyboards, you are restricted to ^Xu) to put it back the way it was if you don't like the result --- this is part of the editor's `undo' feature.

Ksh-style history commands

The third form of history uses the fc builtin. It's the most cumbersome: you have to tell the command which complete lines to execute, and may be given a chance to edit them first (but using an external editor, not in the shell). You probably won't use it that way, but there are three things which are actually controlled by fc which you might use: first, the `r' command repeats the last command (ignoring r's), which is a bit like `!!'. Secondly, the command called `history' is also really fc in disguise. It gives you a list of recent commands. They have numbers next to them; you can use these with bang-history instead of using negative numbers to count backward in the way I originally explained, the advantage being they don't change as you enter more commands. You can give ranges of numbers to history, the first number for where to start listing, and the second where to stop: a particular example is `history 1', which lists all commands (even if the first command it still remembers is higher than 1; it just silently omits all those). The third use of fc is for reading and writing your history so you can keep it between sessions.

2.5.4: Setting up history

In fact, the shell is able to read and write history without being told. You need to tell it where to save the history, however, and for that you have to set the parameter $HISTFILE to the name of the file you want to use (a common choice is `~/.history'). Next, you need to set the parameter $SAVEHIST to the number of lines of your history you want saved. When these two are set, the shell will read $HISTSIZE lines from $HISTFILE at the start of an interactive session, and save the last $SAVEHIST lines you executed at the end of the session. For it to read or write in the middle, you will either need to set one of the options described below (INC_APPEND_HISTORY and SHARE_HISTORY), or use the fc command: fc -R and fc -W read and write the history respectively, while fc -A appends it to the the file (although pruning it if it's longer than $SAVEHIST); fc -WI and fc -AI are similar, but the I means only write out events since the last time history was written.

There is a third parameter $HISTSIZE, which determines the number of lines the shell will keep within one session; except for special reasons which I won't talk about, you should set $SAVEHIST to be no more than $HISTSIZE, though it can be less. The default value for $HISTSIZE is 30, which is a bit stingy for the memory and disk space of today's computers; zsh users often use anything up to 1000. So a simple set of parameters to set in .zshrc is

  HISTSIZE=1000
  SAVEHIST=1000
  HISTFILE=~/.history

and that is enough to get things working. Note that you must set $SAVEHIST and $HISTFILE for automatic reading and writing of history lines to work.

2.5.5: History options

There are also many options affecting history; these increased substantially with version 3.1.6, which provided for the first time INC_APPEND_HISTORY, SHARE_HISTORY, HIST_EXPIRE_DUPS_FIRST, HIST_IGNORE_ALL_DUPS, HIST_SAVE_NO_DUPS and HIST_NO_FUNCTIONS. I have already described BANG_HIST, CSH_JUNKIE_HISTORY and HIST_VERIFY and I won't talk about them again.

APPEND_HISTORY, INC_APPEND_HISTORY, SHARE_HISTORY

Normally, when it writes a history file, zsh just overwrites everything that's there. APPEND_HISTORY allows it to append the new history to the old. The shell will make an effort not to write out lines which should be there already; this can get complicated if you have lots of zshs running in different windows at once. This option is a good one for most people to use. INC_APPEND_HISTORY means that instead of doing this when the shell exits, each line is added to the history in this way as it is executed; this means, for example, that if you start up a zsh inside the main shell its history will look like that of the main shell, which is quite useful. It also means the ordering of commands from different shells running at the same time is much more logical --- basically just the order they were executed --- so for 3.1.6 and higher this option is recommended.

SHARE_HISTORY takes this one stage further: as each line is added, the history file is checked to see if anything was written out by another shell, and if so it is included in the history of the current shell too. This means that zsh's running in different windows but on the same host (or more generally with the same home directory) share the same history. Note that zsh tries not to confuse you by having unexpected history entries pop up: if you use !-style history, the commands from other session don't appear in the history list until you explicitly type the history command to display them, so that you can be sure what command you are actually reexecuting. The Korn shell always behaves as if SHARE_HISTORY is set, presumably because it doesn't store history internally.

EXTENDED_HISTORY

This makes the format of the history entry more complicated: in addition to just the command, it saves the time when the command was started and how long it ran for. The history command takes three options which use this: history -d prints the start time of the command; history -f prints that as well as the date; history -D (which you can combine with -f or -d) prints the command's elapsed time. The date format can be changed with -E for European (day.month.year) and -i for international (year-month-day) formats. The main reasons why you wouldn't want to set this would be shortage of disk space, or because you wanted your history file to be read by another shell.

HIST_IGNORE_DUPS, HIST_IGNORE_ALL_DUPS, HIST_EXPIRE_DUPS_FIRST, HIST_SAVE_NO_DUPS, HIST_FIND_NO_DUPS

These options give ways of dealing with the duplicate lines that often appear in the history. The simplest is HIST_IGNORE_DUPS, which tells the shell not to store a history line if it's the same as the previous one, thus collapsing a lot of repeated commands down to one; this is a very good option to have set. It does nothing when duplicate lines are not adjacent, so for example alternating pairs of commands will always be stored. The next two options can help here: HIST_IGNORE_ALL_DUPS simply removes copies of lines still in the history list, keeping the newly added one, while HIST_EXPIRE_DUPS_FIRST is more subtle: it preferentially removes duplicates when the history fills up, but does nothing until then. HIST_SAVE_NO_DUPS means that whatever options are set for the current session, the shell is not to save duplicated lines more than once; and HIST_FIND_NO_DUPS means that even if duplicate lines have been saved, searches backwards with editor commands don't show them more than once.

HIST_ALLOW_CLOBBER, HIST_REDUCE_BLANKS

These allow the history mechanism to make changes to lines as they are entered. The first affects output redirections, where you use the symbol > to redirect the output of a command or set of commands to a named file, or use >``> to append the output to that file. If you have the NO_CLOBBER option set, then

  touch newfile
  echo hello >newfile

fails, because the `touch' command has created newfile and NO_CLOBBER won't let you overwrite (clobber) it in the next line. With HIST_ALLOW_CLOBBER, the second line appears in the history as

  echo hello >|newfile

where the >| overrides NO_CLOBBER. So to get round the NO_CLOBBER you can just go back to the previous line and execute it without editing it.

The second option, HIST_REDUCE_BLANKS, will tidy up the line when it is entered into the history by removing any excess blanks that mean nothing to the shell. This can also mean that the line becomes a duplicate of a previous one even if it would not have been in its untidied form. It is smart enough not to remove blanks which are important, i.e. are quoted.

HIST_IGNORE_SPACE, HIST_NO_STORE, HIST_NO_FUNCTIONS

These three options allow you to say that certain lines shouldn't go into the history at all. HIST_IGNORE_SPACE means that lines which begin with a space don't go into the history; the idea is that you deliberately type a space, which is not otherwise significant to the shell, before entering any line you want to be forgotten immediately afterwards. In zsh 4.0.1 this is implemented so that you can always recall the immediately preceding line for editing, even if it had a space; but when the next line is executed and entered into the history, the line beginning with the space is forgotten.

HIST_NO_STORE tells the shell not to store history or fc commands. while HIST_NO_FUNCTIONS tells it not to store function definitions as these, though usually infrequent, can be tiresomely long. A function definition is anything beginning `function funcname {...' or `funcname () { ...'.

NO_HIST_BEEP

Finally, HIST_BEEP is used in the editor: if you try to scroll up or down beyond the end of the history list, the shell will beep. It is on by default, so use NO_HIST_BEEP to turn it off.

2.5.6: Prompts

Most people have some definitions in .zshrc for altering the prompt you see at the start of each line. I've already mentioned PROMPT_PERCENT (set by default) and PROMPT_SUBST (unset by default); I'll assume here you haven't changed these settings, and point out some of the possibilities with prompt escapes, sequences that start with a `%'. If you get really sophisticated, you might need to turn on PROMPT_SUBST.

The main prompt is in a parameter called either $PS1 or $PROMPT or $prompt; the reason for having all these names is historical --- they come from different shells --- so I'll just stick with the shortest. There is also $RPS1, which prints a prompt at the right of the screen. The point of this is that it automatically disappears if you type so far along the line that you run into it, so it can help make the best use of space for showing long things like directories.

$PS2 is shown when the shell is waiting for some more input, i.e. it knows that what you have typed so far isn't a complete line: it may contain the start of a quoted expression, but not the end, or the start of some syntactic structure which is not yet finished. Usually you will keep it different from $PS1, but all the same escapes are understood in all five prompts.

$PS3 is shown within a loop started by the shell's select mechanism, when the shell wants you to input a choice: see the zshmisc manual page as I won't say much about that.

$PS4 is useful in debugging: there is an option XTRACE which causes the shell to print out lines about to be executed, preceded by $PS4. Only from version 3.1.6 has it started to be substituted in the same way as the other prompts, though this turns out to be very useful --- see `Location in script or function' in the following list.

Here are some of the things you might want to include in your prompts. Note that you can try this out before you alter the prompt by using `print -P': this expands strings just are as they are in prompts. You will probably need to put the string in single quotes.

The time

Zsh allows you lots of different ways of putting the time into your prompt with percent escapes. The simplest are %t and %T, the time in 12 and 24 hour formats, and %*, the same as %T but with seconds; you can also have the date as (e.g.) `Wed 22' using %w, as `9/22/99' (US format) using %W, or as `99-09-22' (International format) using %D. However, there is another way of using %D to get many more possibilities: a following string in braces, `%D{...}' can contain a completely different set of percent escapes all of which refer to elements of the time and date. On most systems, the documentation for the strftime function will tell you what these are. zsh has a few of its own, given in the zshmisc manual page in the PROMPT EXPANSION section. For example, I use %D{%L:%M} which gives the time in hours and minutes, with the hours as a single digit for 1 to 9; it looks more homely to my unsophisticated eyes.

You can have more fun by using the `%(numX.true.false)' syntax, where X is one of t or T. For t, if the time in minutes is the same as num (default zero), then true is used as the text for this section of the prompt, while false is used otherwise. T does the same for hours. Hence

  PS1='%(t.Ding!.%D{%L:%M})%# '

prints the message `Ding!' at zero minutes past the hour, and a more conventional time otherwise. The `%#' is the standard sequence which prints a `#' if you are the superuser (root), or a `%' for everyone else, which occurs in a lot of people's prompts. Likewise, you could use `%(30t.Dong!....' for a message at half past the hour.

The current directory

The sequence `%~' prints out the directory, with any home or named directories (see below) shortened to the form starting with ~; the sequence `%/' doesn't do that shortening, so usually `%~' is better. Directories can be long, and there are various ways to deal with it. First, if you are using a windowing system you can put the directory in the title bar, rather than anywhere inside the window. Second, you can use $RPS1 which disappears when you type near it. Third, you can pick segments out of `%~' or `%/' by giving them a number after the `%': for example, `%1~' just picks out the last segment of the path to the current directory.

The fourth way gives you the most control. Prompts or parts of prompts, not just bits showing the directory, can be truncated to any length you choose. To truncate a path on the left, use something like `%10<...<%~'. That works like this: the `%<``<' is the basic form for truncation. The 10 after the `%' says that anything following is limited to 10 characters, and the characters `...' are to be displayed whenever the prompt would otherwise be longer than that (you can leave this empty). This applies to anything following, so now the %~ can't be longer than 10 characters, otherwise it will be truncated (to 7 characters, once the `...' has been printed). You can turn off truncation with `%<``<', i.e. no number after the `%'; truncation then applies to the entire region between where it was turned on and where it was turned off (this has changed from older versions of zsh, where it just applied to individual `%' constructs).

What are you waiting for?

The prompt $PS2 appears when the shell is waiting for you to finish entering something, and it's useful to know what the shell is waiting for. The sequence `%_' shows this. It's part of the default $PS2, which is `%_> '. Hence, if you type `if true; then' and <RET>, the prompt will say `then> '. You can also use it in the trace prompt, $PS4, to show the same information about what is being executed in a script or function, though as there is usually enough information there (as described next) it's not part of the default. In this case, a number after the `%' will limit the depth shown, so with `%1_' only the most recent thing will be mentioned.

Location in script or function

The default $PS4 contains `%N' and `%i', which tell you the name of the most recently started function, script, or sourced file, and the line number being executed inside it; they are not very useful in other prompts. However, `%i' in $PS1 will tell you the current interactive line number, which zsh keeps track of, though doesn't usually show you; the parameter $LINENO contains the same information.

Another point to bear about `%i' in mind is that the line number shown applies to the version of a function first read in, not how it appears with the `functions' command, which is tidied up. If you use autoloaded functions, however, the file containing the function will usually be what you want to alter, so this shouldn't be a problem when debugging.

Remember, the $PS4 display only happens when the XTRACE option is set; as options may be local to functions, and always are to scripts, you will often need to put an explicit `setopt xtrace' at the top of whatever you are debugging. Alternatively, you can use `typeset -ft funcname' to turn on tracing for that function (something I only just discovered); use `typeset +ft funcname' to turn it off again.

Other bits and pieces

There are many other percent escapes described in the zshmisc manual page, mostly straightforward. For example, `%h' shows you the history entry number, useful if you are using bang-history; `%m' shows you the current host name up to any dot; `%n' shows the username.

There are two other features I happen to use myself. First, it's sometimes convenient to know when the last command failed. Every command returns a status, which is a number: zero for success, some other number for some type of failure. You can get this from the parameter `$?' or `$status' (again, they refer to the same thing). It's also available in the prompt as `%?', and there's also one of the so-called `ternary' expressions with parentheses I described for time, which pick different strings depending on a test. Here the test is, reasonably enough, `%(?...'. Putting these two together, you can get a message which is only displayed when the exit status is non-zero; I've put an extra set of parentheses around the number just to make it clearer, where the `)' needs to be turned into `%)' to stop it marking the end of the group:

  PS1='%(?..(%?%))%# '

It's also sometimes convenient to know if you're in a subshell, that is if you've started another shell within the main one by typing `zsh'. You can do this by using another ternary expression:

  PS1='%(2L.+.)%# '

This checks the parameter SHLVL, which is incremented every time a new zsh starts, so if there was already one running (which would have set SHLVL to 1), it will now be 2; and if SHLVL is at least 2, an extra `+' is printed in front of the prompt, otherwise nothing. If you're using a windowing system, you may need to turn the 2 into 3 as there may be a zsh already running when you first log in, so that the shells in the windows have SHLVL set to 2 already. This depends a good deal on how your windowing system is set up; finding out more is left as an exercise for the reader.

Colours

Many terminals can now display colours, and it is quite useful to be able to put these into prompts to distinguish those from the surrounding text. I often find a programme has just dumped a whole load of output on my terminal and it's not obvious where it starts. Being able to find the prompt just before helps a lot.

Colors, like bold or underlined text, use escape sequences which don't move the cursor. The golden rule for inserting any such escape sequences into prompts is to surround them with `%{' at the start and `%}' at the end. Otherwise, the shell will be confused about the length of the line. This affects what happens when the line editor needs to redraw the line, and also changes the position of the right prompt $RPS1, if you use that. You don't need that with the special sequences %B and %b, which start and stop bold text, because the shell already knows what to do with those; it's only random characters which you happen to know don't move the cursor, though the shell doesn't, that cause the problem.

In the case of colours, there is a shell function colors supplied with the standard distribution to help you. When loaded and run, it defines associative array parameters $fg and $bg which you use to extract the escape sequences for given colours, for example ${fg[red]}${bg[yellow]} produces the sequences for red text on a yellow background. So for example,

  PS1="%{${bg[white]}${fg[red]}%}%(?..(%?%))\ 
  %{${fg[yellow]}${bg[black]}%}%# "

produces a red-on-white `(1)' if the previous programme exited with status 1, but nothing if it exited with status 0, followed by a yellow-on-black `%' or `#' if you are the superuser. Note the use of the double quotes here to force the parameters to be expanded straight away --- the escape sequences are fixed, so they don't need to be re-extracted from the parameters every time the prompt is shown.

Even if your terminal does support colour, there's no guarantee all the possibilities work, although the basic ANSI colour scheme is fairly standard. The colours understood are: cyan, white, yellow, magenta, black, blue, red, grey, green. You can also used `default', which puts the terminal back how it was to begin with. In addition, you can use the basic colours with the parameters $bg_bold and $fg_bold for bold varieties of the colours and $bg_no_bold and $fg_no_bold to switch explicitly back to non-bold.

Themes

There are also a set of themes provided as functions to set up your prompt to various predefined possibilities. These make use of the colours set up as described above. See the zshcontrib manual page for how to do this (search for `prompt themes').

2.5.7: Named directories

As already mentioned, `~/' at the start of a filename expands to your home directory. More generally, `~user/' allows you to refer to the home directory of any other user. Furthermore, zsh lets you define your own named directories which use this syntax. The basic idea is simple, since any parameter can be a named directory:

  dir=/tmp/mydir
  print ~dir

prints `/tmp/mydir'. So far, this isn't any different from using the parameter as $dir. The difference comes if you use the `%~' construct, described above, in your prompt. Then when you change into that directory, instead of seeing the message `/tmp/mydir', you will see the abbreviation `~dir'.

The shell will not register the name of the directory until you force it to by using `~dir' yourself at least once. You can do the following in your .zshrc:

  dir=/tmp/mydir
  bin=~/myprogs/bin
  : ~dir ~bin

where `:' is a command that does nothing --- but its arguments are checked for parameters and so on in the usual way, so that the shell can put dir and bin into its list of named directories. A more simple way of doing this is to set the option AUTO_NAME_DIRS; then any parameter created which refers to a directory will automatically be turned into a name. The directory must have an absolute path, i.e. its expanded value, after turning any `~'s at the start into full paths, must begin with a `/'. The parameter $PWD, which shows the current directory, is protected from being turned into ~PWD, since that would tell you nothing.

2.5.8: `Go faster' options for power users

Here are a few more random options you might want to set in your .zshrc.

NO_BEEP

Normally zsh will beep if it doesn't like something. This can get extremely annoying; `setopt nobeep' will turn it off. I refer to this informally as the OPEN_PLAN_OFFICE_NO_VIGILANTE_ATTACKS option.

AUTO_CD

If this option is set, and you type something with no arguments which isn't a command, zsh will check to see if it's actually a directory. If it is, the shell will change to that directory. So `./bin' on its own is equivalent to `cd ./bin', as long as the directory `./bin' really exists. This is particularly useful in the form `..', which changes to the parent directory.

CD_ABLE_VARS

This is another way of saving typing when changing directory, though only one character. If a directory doesn't exist when you try to change to it, zsh will try and find a parameter of that name and use that instead. You can also have a `/' and other bits after the parameter. So `cd foo/dir', if there is no directory `foo' but there is a parameter $foo, becomes equivalent to `cd $foo/dir'.

EXTENDED_GLOB

Patterns, to match the name of files and other things, can be very sophisticated in zsh, but to get the most out of them you need to use this option, as otherwise certain features are not enabled, so that people used to simpler patterns (maybe just `*', `?' and `[...]') are not confused by strange happenings. I'll say much more about zsh's pattern features, but this is to remind you that you need this option if you're doing anything clever with `~', `#', `^' or globbing flags --- and also to remind you that those characters can have strange effects if you have the option set.

MULTIOS

I mentioned above that to get zsh to behave like ksh you needed to set NO_MULTIOS, but I didn't say what the MULTIOS option did. It has two different effects for output and input.

First, for output. Here it's an alternative to the tee programme. I've mentioned once, but haven't described in detail, that you could use >filename to tell the shell to send output into a file with a given name instead of to the terminal. With MULTIOS set, you can have more than one of those redirections on the command line:

  echo foo >file1 >file2

Here, `foo' will be written to both the named files; zsh copies the output. The pipe mechanism, which I'll describe better in chapter 3, is a sort of redirection into another programme instead of into a file: MULTIOS affects this as well:

  echo foo >file1 | sed 's/foo/bar/'

Here, `foo' is again written to file1, but is also sent into the pipe to the programme sed (`stream editor') which substitutes `foo' into `bar' and (since there is no output redirection in this part) prints it to the terminal.

Note that the second example above has several times been reported as a bug, often in a form like:

 some_command 2>&1 >/dev/null | sed 's/foo/bar/'

The intention here is presumably to send standard error to standard output (the `2>&1', a very commonly used shell hieroglyphic), and not send standard output anywhere (the `>/dev/null'). (If you haven't met the concept of `standard error', it's just another output channel which goes to the same place as normal output unless you redirect it; it's used, for example to send error messages to the terminal even if your output is going somewhere else.) In this example, too, the MULTIOS feature forces the original standard output to go to the pipe. You can see this happening if we put in a version of `some_command':

 { echo foo error >&2; echo foo not error;  } 2>&1 >/dev/null |
  sed 's/foo/bar/'

where you can consider the stuff inside the `{...}' as a black box that sends the message `foo error' to standard error, and `foo not error' to standard output. With MULTIOS, however, the result is

 error bar
  not error bar

because both have been sent into the pipe. Without MULTIOS you get the expected result,

 error bar

as any other Bourne-style shell would produce. There

On input, MULTIOS arranges for a series of files to be read in order. This time it's a bit like using the programme cat, which combines all the files listed after it. In other words,

  cat file1 file2 | myprog

(where myprog is some programme that reads all the files sent to it as input) can be replaced by

  myprog <file1 <file2

which does the same thing. Once again, a pipe counts as a redirection, and the pipe is read from first, before any files listed after a `<':

  echo then this >testfile
  echo this first | cat <testfile

CORRECT, CORRECT_ALL

If you have CORRECT set, the shell will check all the commands you type and if they don't exist, but there is one with a similar name, it will ask you if you meant that one instead. You can type `n' for no, don't correct, just go ahead; `y' for yes, correct it then go ahead; `a' for abort, don't do anything; `e' for edit, return to the editor to edit the same line again. Users of the new completion system should note this is not the same correction you get there: it's just simple correction of commands.

CORRECT_ALL applies to all the words on the line. It's a little less useful, because currently the shell has to assume that they are supposed to be filenames, and will try to correct them if they don't exist as such, but of course many of the arguments to a command are not filenames. If particular commands generate too many attempts to correct their arguments, you can turn this off by putting `nocorrect' in front of the command name. An alias is a very good way of doing this, as described next.

2.5.9: aliases

An alias is used like a command, but it expands into some other text which is itself used as a command. For example,

  alias foo='print I said foo'
  foo

prints (guess what) `I said foo'. Note the syntax for definition --- you need the `=', and you need to make sure the whole alias is treated by the shell as one word; you can give a whole list of aliases to the same `alias' command. You may be able to think of some aliases you want to define in your startup files; .zshrc is probably the right place. If you have CORRECT_ALL set, the way to avoid the `mkdir' command spell-checking its arguments --- which is useless, because they have to be non-existent for the command to work --- is to define:

  alias mkdir='nocorrect mkdir'

This shows one useful feature about aliases: the alias can contain something of the same name as itself. When it is encountered in the expansion text (the right hand side), the shell knows it is not to expand the alias again, but this time to treat it as a real command. Note that functions do not have this property: functions are more powerful than aliases and in some cases it is useful for them to call themselves, It's a common mistake to have functions call themselves over and over again until the shell complains. I'll describe ways round this in chapter 3.

One other way functions are more powerful than aliases is that functions can take arguments while aliases can't --- in other words, there is no way of referring inside the alias to what follows it on the command line, unlike a function, and also unlike aliases in csh (because that has no functions, that's why). It is just blindly expanded, and the remainder of the command line stuck on the end. Hence aliases in zsh are usually kept for quite simple things, and functions are written for anything more complicated. You couldn't do that trick with `nocorrect' using a function, though, since the function is called too late: aliases are expanded straight away, so the nocorrect is found in time to be useful. You can almost think of them as just plain typing abbreviations.

Normal aliases only work when in command position, i.e. at the start of the command line (more strictly, when zsh is expecting a command). There are other things called `global aliases', which you define by the `-g' option to alias, which will be expanded at any position on the command line. You should think seriously before defining these, as they can have a drastic effect. Note, however, that quoting a word, or even a single character, will stop an alias being expanded for it.

I only tend to use aliases in interactive shells, so I define them from .zshrc, but you may want to use .zshenv if you use aliases more widely. In fact, to keep my .zshrc neat I save all the aliases in a separate file called .aliasrc and in .zshrc I have:

  if [[ -r ~/.aliasrc ]]; then
    . ~/.aliasrc
  fi

which checks if there is a readable file ~/.aliasrc, and if there is, it runs it in exactly the same way the normal startup files are run. You can use `source' instead of `.' if it means more to you; `.' is the traditional Bourne and Korn shell name, however.

2.5.10: Environment variables

Often, the manual for a programme will tell you to define certain environment variables, usually a collection of uppercase letters with maybe numbers and the odd underscore. These can pass information to the programme without you needing to use extra arguments. In zsh, environment variables appear as ordinary shell parameters, although they have to be defined slightly differently: strictly, the environment is a special region outside the shell, and zsh has to be told to put a copy there as well as keeping one of its own. The usual syntax is

  export VARNAME='value'

in other words, like an ordinary assignment, but with `export' in front. Note there is no `$' before the name of the environment variable; all `export' and similar statements work the same way. The easiest place to put these is in .zshenv --- hence it's name. Environment variables will be passed to any programmes run from a shell, so it may be enough to define them in .zlogin or .zprofile: however, any shell started for you non-interactively won't run those, and there are other possible problems if you use a windowing system which is started by a shell other than zsh or which doesn't run a shell start-up file at all --- I had to tweak mine to make it do so. So .zshenv is the safest place; it doesn't take long to define environment variables. Other people will no doubt give you completely contradictory views, but that's people for you.

Note that you can't export arrays. If you export a parameter, then assign an array to it, nothing will appear in the environment; you can use the external command `printenv VARNAME' (again no `$' because the command needs to know the name, not the value) to check. There's a more subtle problem with arrays, too. The export builtin is just a special case of the builtin typeset, which defines a variable without marking it for export to the environment. You might think you could do

  typeset array=(this doesn\'t work)

but you can't --- the special array syntax is only understood when the assignment does not follow a command, not in normal arguments like the case here, so you have to put the array assignment on the next line. This is a very easy mistake to make. More uses of typeset will be described in chapter 3; they include creating local parameters in functions, and defining special attributes (of which the `export' attribute is just one) for parameters.

2.5.11: Path

It helps to be able to find external programmes, i.e. anything not part of the shell, any command other than a builtin, function or alias. The $path array is used for this. Actually, what the system needs is the environment variable $PATH, which contains a list of directories in which to search for programmes, separated from each other by a colon. These directories are the individual components of the array $path. So if $path contains

  path=(/bin /usr/bin /usr/local/bin .)

then $PATH will automatically contain the effect of

  PATH=/bin:/usr/bin:/usr/local/bin:.

without you having to set that. The idea is simply that, while the system needs $PATH because it doesn't understand arrays, it's much more flexible to be able to use arrays within the shell and hence pretty much forget about the $PATH form.

Changes to the path are similar to changes to environment variables described above, so all that applies. There's a slight difficulty in setting $path in .zshenv however, even though the reasons given above for doing so still apply. Usually, the path will be set for you, either by the system, or by the system administrator in one of the global start up files, and if you change path you will simply want to add to it. But if your .zshenv contains

  path=(~/bin ~/progs/bin $path)

--- which is the right way of adding something to the front of $path --- then every time .zshenv is called, ~/bin and ~/progs/bin are stuck in front, so if you start another zsh you will have two sets there.

You can add tests to see if something's already there, of course. Zsh conveniently allows you to test for the existence of elements in an array. By preceding an array index by (r) (for reverse), it will try to find a matching element and return that, else an empty string. Here's a way of doing that (but don't add this yet, see the next paragraph):

  for dir in ~/bin ~/progs/bin; do
    if [[ -z ${path[(r)$dir]} ]]; then
      path=($dir $path)
    fi 
  done

That for... do ... done is another special shell construct. It takes each thing after `in' and assigns it in turn to the parameter named before the `in' --- $dir, but because this is a form of assignment, the `$' is left off --- so the first time round it has the effect of dir=~/bin, and the next time dir=~/progs/bin. Then it executes what's in the loop. The test -z checks that what follows is empty: in this case it will be if the directory $dir is not yet in $path, so it goes ahead and adds it in front. Note that the directories get added in the reverse of the order they appear.

Actually, however, zsh takes all that trouble away from you. The incantation `typeset -U path', where the -U stands for unique, tells the shell that it should not add anything to $path if it's there already. To be precise, it keeps only the left-most occurrence, so if you added something at the end it will disappear and if you added something at the beginning, the old one will disappear. Thus the following works nicely in .zshenv:

  typeset -U path
  path=(~/bin ~/progs/bin $path)

and you can put down that `for' stuff as a lesson in shell programming. You can list all the variables which have uniqueness turned on by typing `typeset +U', with `+' instead of `-', because in the latter case the shell would show the values of the parameters as well, which isn't what you need here. The -U flag will also work with colon-separated arrays, like $PATH.

2.5.12: Mail

Zsh will check for new mail for you. If all you need is to be reminded of something arriving in your normal folder every now and then, you just need to set the parameter $MAIL to wherever that is: it's typically one of /usr/spool/mail, /var/spool/mail, or /var/mail.

The array $mailpath allows more possibilities. Like $path, it has a colleague in uppercase, $MAILPATH, which is a colon-separated array. The system doesn't need that, this time, so it's mainly there so that you can export it to another version of zsh; exporting arrays won't work. As may by now be painfully clear, if you set in .zshenv or .zshrc, you don't need to export it, because it's set in each instance of the shell. The elements of $mailpath work like $MAIL, so you can specify different places where mail arrives. That's most useful if you have a programme like filter or procmail running to redistribute arriving mail to different folders. You can specify a different message for each folder by putting `?message' at the end. For example, mine looks like this.

  mailpref=/temp/pws/Mail
  mailpath=($mailpref/newmail
            $mailpref/zsh-new'?New zsh mail' 
            $mailpref/list-new'?New list mail'
            $mailpref/urth-new'?New Urth mail')

Note that zsh knows the array isn't finished until the `)', even though the elements are on different lines; this is one very good reason for setting $mailpath rather than $MAILPATH, which needs one long chunk.

The other parameter of interest is $MAILCHECK, which gives the frequency in seconds when zsh should check for new mail. The default is 60. Actually, zsh only checks just after a command has finished running and it is about to print a prompt. Since checking files doesn't take long, you can usually set this to its minimum value, which is MAILCHECK=1; zero doesn't work because it switches off checking. One reason why you wouldn't want to do that might be because $MAIL and $mailpath can contain directories instead of ordinary files; these will be checked recursively for any files with something new in them, so this can be slow.

Finally, there is one associated option, MAIL_WARNING (though MAIL_WARN is also accepted for the same thing for reasons of compatibility with less grammatical shells). The shell remembers when it found the mail file was checked; next time it checks, it compares the date. If there is no new mail, but the date of the file changed anyway, it will print a warning message. This will happen if you read the mail with your mail reader and put the messages somewhere else. Presumably you know you did that, so the warning may not be all that useful.

2.5.13: Other path-like things

There are other pairs like $path and $PATH. I will keep back talk of $cdpath until I say more about the way zsh handles directories. When I mentioned $fpath, I didn't say there was also $FPATH, but there is. Then there is $manpath and $MANPATH; these aren't used by the shell at all, but $MANPATH, if exported, is used by the man external command, and $manpath gives an easier way to set it.

From 3.1.6 there is a mechanism to define your own such combinations; if this had been available before, there would have been no need to build in $manpath and $MANPATH. In .zshenv you would put,

  export -TU TEXINPUTS texinputs

to define such a pair. The -T (for tie) is the key to that; I've used `export' even though the basic variable declaration command is `typeset' because you nearly always want to get the colon-separated version ($TEXINPUTS here) visible to the environment, and I've set -U as described above for $path because it's a neat feature anyway. Now you can assign to the array $texinputs and let the programme (TeX or its derivatives) see $TEXINPUTS. Another useful variable to do this with is $LD_LIBRARY_PATH, which on most modern versions of UNIX (and Linux) tells the system where to find the libraries which provide extra functions when it runs a programme.

2.5.14: Version-specific things

Since zsh changes faster than almost any other command interpreter known to humankind, you will often find you need to find out what version you are using. This can get a bit verbose; indeed, the parameter you need to check, which is now $ZSH_VERSION, used simply to be called $VERSION before version 3.0. If you are not using legacy software of that kind, you can probably get away with tests like this:

  if [[ $ZSH_VERSION == 3.1.<5->* ||
        $ZSH_VERSION == 3.<2->* ||
        $ZSH_VERSION == <4->* ]]; then
    # set feature which appeared first in 3.1.5
  fi

It's like that to be futureproof: it says that if this is a 3.1 release, it has to be at least 3.1.5, but any 3.2 release (there weren't any), or any release 4 or later, will also be OK. The `<5->' etc. are advanced pattern matching tests: pattern matching uses the same symbols as globbing, but to test other things, here what's on the left of the `=='. This one matches any number which is at least 5, for example 6 or 10 or 252, but not 1 or 4. There are also development releases; nowadays the version numbers look like X.Y.Z-tag-N (tag is some short word, the others are numbers) but unless you're keeping up with development you won't need to look for those, since they aren't released officially. That `==' in the test could also be just `=', but the manual says the former is preferred, so I've used them here, even though people usually don't bother.

Version 4 of zsh provides a function is-at-least to do this for you: it looks only at the numbers X, Y and Z (and N if it exists), ignoring all letters and punctuation. You give it the minimum version of the shell you need and it returns true if the current shell is recent enough. For example, `is-at-least 3.1.6-pws-9' will return true if the current version of zsh is 3.1.6-dev-20 (or 3.1.9, or 4.0.1, and so on), which is the correct behaviour. As with any other shell function, you have to arrange for is-at-least to be autoloaded if you want to use it.

2.5.15: Everything else

There are many other possibilities for things to go in startup files; in particular, I haven't touched on defining things for the line editor and setting up completion. There's quite a lot to explain for those, so I'll come back to those in the appropriate chapters. You just need to remember that all that stuff should go in .zshrc, since you need it for all interactive shells, and for no others.

Table of Contents generated with DocToc

Chapter 3: Dealing with basic shell syntax

Chapter 3: Dealing with basic shell syntax

This chapter is a more thorough examination of much of what appeared in the chapter 2; to be more specific, I assume you're sitting in front of your terminal about to use the features you just set up in your initialisation files and want to know enough to get them going. Actually, you will probably spend most of the time editing command lines and in particular completing commands --- both of these activities are covered in later chapters. For now I'm going to talk about commands and the syntax that goes along with using them. This will let you write shell functions and scripts to do more of your work for you.

In the following there are often several consecutive paragraphs about quite minor features. If you find you read this all through the first time, maybe you need to get out more. Most people will probably find it better to skim through to find what the subject matter is, then come back if they later find they want to know more about a particular aspect of the shell's commands and syntax.

One aspect of the syntax is left to chapter 5: there's just so much to it, and it can be so useful if you know enough to get it right, that it can't all be squashed in here. The subject is expansion, covering a multitude of things such as parameter expansion, globbing and history expansions. You've already met the basics of these in chapter 2; but if you want to know how to pick a particular file with a globbing expression with pinpoint accuracy, or how to make a single parameter expansion reduce a long expression to the words you need, you should read that chapter; it's more or less self-contained, so you don't necessarily need to know everything in this one.

We start with the most basic issue in any command line interpreter, running commands. As you know, you just type words separated by spaces, where the first word is a command and the remainder are arguments to it. It's important to distinguish between the types of command.

3.1: External commands

External commands are the easiest, because they have the least interaction with the shell --- many of the commands provided by the shell itself, which are described in the next section, are built into the shell especially to avoid this difficulty.

The only major issue is therefore how to find them. This is done through the parameters $path and $PATH, which, as I described in chapter 2, are tied together because although the first one is more useful inside the shell --- being an array, its various parts can be manipulated separately --- the second is the one that is used by other commands called by the shell; in the jargon, $PATH is `exported to the environment', which means exactly that other commands called by the shell can see its value.

So suppose your $path contains

  /home/pws/bin /usr/local/bin /bin /usr/bin

and you try to run `ls'. The shell first looks in /home/pws/bin for a command called ls, then in /usr/local/bin, then in /bin, where it finds it, so it executes /bin/ls. Actually, the operating system itself knows about paths if you execute a command the right way, so the shell doesn't strictly need to.

There is a subtlety here. The shell tries to remember where the commands are, so it can find them again the next time. It keeps them in a so-called `hash table', and you find the word `hash' all over the place in the documentation: all it means is a fast way of finding some value, given a particular key. In this case, given the name of a command, the shell can find the path to it quickly. You can see this table, in the form `key=value', by typing `hash'.

In fact the shell only does this when the option HASH_CMDS is set, as it is by default. As you might expect, it stops searching when it finds the directory with the command it's looking for. There is an extra optimisation in the option HASH_ALL, also set by default: when the shell scans a directory to find a command, it will add all the other commands in that directory to the hash table. This is sensible because on most UNIX-like operating systems reading a whole lot of files in the same directory is quite fast.

The way commands are stored has other consequences. In particular, zsh won't look for a new command if it already knows where to find one. If I put a new ls command in /usr/local/bin in the above example, zsh would continue to use /bin/ls (assuming it had already been found). To fix this, there is the command rehash, which actually empties the command hash table, so that finding commands starts again from scratch. Users of csh may remember having to type rehash quite a lot with new commands: it's not so bad in zsh, because if no command was already hashed, or the existing one disappeared, zsh will automatically scan the path again; furthermore, zsh performs a rehash of its own accord if $path is altered. So adding a new duplicate command somewhere towards the head of $path is the main reason for needing rehash.

One thing that can happen if zsh hasn't filled its command hash table and so doesn't know about all external commands is that the AUTO_CD option, mentioned in the previous chapter and again below, can think you are trying to change to a particular directory with the same name as the command. This is one of the drawbacks of AUTO_CD.

To be a little bit more technical, it's actually not so obvious that command hashing is needed at all; many modern operating systems can find commands quickly without it. The clincher in the case of zsh is that the same hash table is necessary for command completion, a very commonly used feature. If you type `compr<TAB>', the shell completes this to `compress'. It can only do this if it has a list of commands to complete, and this is the hash table. (In this case it didn't need to know where to find the command, just its name, but it's only a little extra work to store that too.) If you were following the previous paragraphs, you'll realise zsh doesn't necessarily know all the possible commands at the time you hit TAB, because it only looks when it needs to. For this purpose, there is another option, HASH_LIST_ALL, again set by default, which will make sure the command hash table is full when you try to complete a command. It only needs to do this once (unless you alter $path), but it does mean the first command completion is slow. If HASH_LIST_ALL is not set, command completion is not available: the shell could be rewritten to search the path laboriously every single time you try to complete a command name, but it just doesn't seem worth it.

The fact that $PATH is passed on from the shell to commands called from it (strictly only if the variable is marked for export, as it usually is --- this is described in more detail with the typeset family of builtin commands below) also has consequences. Some commands call subcommands of their own using $PATH. If you have that set to something unusual, so that some of the standard commands can't be found, it could happen that a command which is found nonetheless doesn't run properly because it's searching for something it can't find in the path passed down to it. That can lead to some strange and confusing error messages.

One important thing to remember about external commands is that the shell continues to exist while they are running; it just hangs around doing nothing, waiting for the job to finish (though you can tell it not to, as we'll see). The command is given a completely new environment in which to run; changes in that don't affect the shell, which simply starts up where it left off after the command has run. So if you need to do something which changes the state of the shell, an external command isn't good enough. This brings us to builtin commands.

3.2: Builtin commands

Builtin commands, or builtins for short, are commands which are part of the shell itself. Since builtins are necessary for controlling the shell's own behaviour, introducing them actually serves as an introduction to quite a lot of what is going on in the shell. So a fair fraction of what would otherwise appear later in the chapter has accumulated here, one way or another. This does make things a little tricksy in places; count how many times I use the word `subtle' and keep it for your grandchildren to see.

I just described one reason for builtins, but there's a simpler one: speed. Going through the process of setting up an entirely new environment for the command at the beginning, swapping between this command and anything else which is being run on the computer, then destroying it again at the end is considerable overkill if all you want to do is, say, print out a message on the screen. So there are builtins for this sort of thing.

3.2.1: Builtins for printing

The commands `echo' and `print' are shell builtins; they just show what you typed, after the shell has removed all the quoting. The difference between the two is really historical: `echo' came first, and only handled a few simple options; ksh provided `print', which had more complex options and so became a different command. The difference remains between the two commands in zsh; if you want wacky effects, you should look to print. Note that there is usually also an external command called echo, which may not be identical to zsh's; there is no standard external command called print, but if someone has installed one on your system, the chances are it sends something to the printer, not the screen.

One special effect is `print -z' puts the arguments onto the editing buffer stack, a list maintained by the shell of things you are about to edit. Try:

  print -z print -z print This is a line

(it may look as if something needs quoting, but it doesn't) and hit return three times. The first time caused everything after the first `print -z' to appear for you to edit, and so on.

For something more useful, you can write functions that give you a line to edit:

  fn() { print -z print The time now is $(date); }

Now when you type `fn', the line with the date appears on the command line for you to edit. The option `-s' is a bit similar; the line appears in the history list, so you will see it if you use up-arrow, but it doesn't reappear automatically.

A few other useful options, some of which you've already seen, are

-r
don't interpret special character sequences like `\n'
-P
use `%' as in prompts
-n
don't put a newline at the end in case there's more output to follow
-c
print the output in columns --- this means that `print -c *' has the effect of a sort of poor person's `ls', only faster
-l
use one line per argument instead of one column, which is sometimes useful for sticking lists into files, and for working out what part of an array parameter is in each element.

If you don't use the -r option, there are a whole lot of special character sequences. Many of these may be familiar to you from C.

\n
newline
\t
tab
\e or \E
escape character
\a
ring the bell (alarm), usually a euphemism for a hideous beep
\b
move back one character.
\c
don't print a newline --- like the -n option, but embedded in the string. This alternative comes from Berkeley UNIX.
\f
form feed, the phrase for `advance to next page' from the days when terminals were called teletypes, maybe more familiar to you as ^L
\r
carriage return --- when printed, the annoying ^M's you get in DOS files, but actually rather useful with `print', since it will erase everything to the start of the line. The combination of the -n option and a \r at the start of the print string can give the illusion of a continuously changing status line.
\v
vertical tab, which I for one have never used (I just tried it now and it behaved like a newline, only without assuming a carriage return, but that's up to your terminal).

In fact, you can get any of the 255 characters possible, although your terminal may not like some or all of the ones above 127, by specifying a number after the backslash. Normally this consists of three octal characters, but you can use two hexadecimal characters after \x instead --- so `\n', `\012' and `\x0a' are all newlines. `\' itself escapes any other character, i.e. they appear as themselves even if they normally wouldn't.

Two notes: first, don't get confused because `n' is the fourteenth letter of the alphabet; printing `\016' (fourteen in octal) won't do you any good. The remedy, after you discover your text is unreadable (for VT100-like terminals including xterm), is to print `\017'.

Secondly, those backslashes can land you in real quoting difficulties. Normally a backslash on the command line escapes the next character --- this is a different form of escaping to print's --- so

  print \n

doesn't produce a newline, it just prints out an `n'. So you need to quote that. This means

  print \\

passes a single backslash to quote, and

  print \\n

  print '\n'

prints a newline (followed by the extra one that's usually there). To print a real backslash, you would thus need

  print \\\\

Actually, you can get away with the two if there's nothing else after --- print just shrugs its shoulders and outputs what it's been given --- but that's not a good habit to get into. There are other ways of doing this: since single quotes quote anything, including backslashes (they are the only way of making backslashes behave like normal characters), and since the `-r' option makes print treat characters normally,

  print -r '\'

has the same effect. But you need to remember the two levels of quoting for backslashes. Quotes aren't special to print, so

  print \'

is good enough for printing a quote.

echotc

There's an oddity called `echotc', which takes as its argument `termcap' capabilities. This now lives in its own module, zsh/termcap.

Termcap is a now rather old-fashioned way of giving the commands necessary for performing various standard operations on terminals: moving the cursor, clearing to the end of the line, turning on standout mode, and so on. It has now been replaced almost everywhere by `terminfo', a completely different way of specifying capabilities, and by `curses', a more advanced system for manipulating objects on a character terminal. This means that the arguments you need to give to echotc can be rather hard to come by; try the termcap manual page; if there are two, it's probably the one in section five which gives the codes, i.e. `man 5 zsh' or `man -s 5 zsh' on Solaris. Otherwise you'll have to search the web. The reason the zsh manual doesn't give a list is that the shell only uses a few well-known sequences, and there are very many others which will work with echotc, because the sequences are interpreted by a the terminal, not the shell.

This chunk gives you a flavour:

  zmodload -i zsh/termcap
  echotc md
  echo -n bold
  echotc mr
  echo -n reverse
  echotc me
  echo

First we make sure the module is loaded into the shell; on some older operating systems, this only works if it was compiled in when zsh was installed. The option -i to zmodload stops the shell from complaining if the module was already loaded. This is a sensible way of ensuring you have the right facilities available in a shell function, since loading a module makes it available until it is explicitly unloaded.

You should see `bold' in bold characters, and `reverse' in bold reverse video. The `md' capability turns on bold mode; `mr' turns on reverse video; `me' turns off both modes. A more typical zsh way of doing this is:

  print -P '%Bbold%Sreverse%b%s'

which should show the same thing, but using prompt escapes --- prompts are the most common use of special fonts. The `%S' is because zsh calls reverse `standout' mode, because it does. (On a colour xterm, you may find `bold' is interpreted as `blue'.)

There's a lot more you can do with echotc if you really try. The shell has just acquired a way of printing terminfo sequences, predictably called echoti, although it's only available on systems where zsh needs terminfo to compile --- this happens when the termcap code is actually a part of terminfo. The good news about this is that terminfo tends to be better documented, so you have a good chance of finding out the capabilities you want from the terminfo manual page. The echoti command lives in another predictably named module, zsh/terminfo.

3.2.2: Other builtins just for speed

There are only a few other builtins which are there just to make things go faster. Strictly, tests could go into this category, but as I explained in the last chapter it's useful to have tests in the form

  if [[ $var1 = $var2 ]]; then
    print doing something
  fi

be treated as a special syntax by the shell, in case $var1 or $var2 expands to nothing which would otherwise confuse it. This example consists of two features described below: the test itself, between the double square brackets, which is true if the two substituted values are the same string, and the `if' construct which runs the commands in the middle (here just the print) if that test was true.

The builtins `true' and `false' do nothing at all, except return a command status zero or one, respectively. They're just used as placeholders: to run a loop forever --- while will also be explained in more detail later --- you use

  while true; do
    print doing something over and over
  done

since the test always succeeds.

A synonym for `true' is `:'; it's often used in this form to give arguments which have side effects but which shouldn't be used --- something like

  : ${param:=value}

which is a common idiom in all Bourne shell derivatives. In the parameter expansion, $param is given the value value if it was empty before, and left alone otherwise. Since that was the only reason for the parameter expansion, you use : to ignore the argument. Actually, the shell blithely builds the command line --- the colon, followed by whatever the value of $param is, whether or not the assignment happened --- then executes the command; it just so happens that `:' takes no notice of the arguments it was given. If you're switching from ksh, you may expect certain synonyms like this to be aliases, rather than builtins themselves, but in zsh they are actually builtins; there are no aliases predefined by the shell. (You can still get rid of them using `disable', as described below.)

3.2.3: Builtins which change the shell's state

A more common use for builtins is that they change something inside the shell, or report information about what's going on in the shell. There is one vital thing to remember about external commands. It applies, too, to other cases we'll meet where the shell `forks', literally splitting itself into two parts, where the forked-off part behaves just like an external command. In both of these cases, the command is in a different process, UNIX's basic unit of things that run. (In fact, even Windows knows about processes nowadays, although they interact a little bit differently with one another.)

The vital thing is that no change in a separate process started by the shell affects the shell itself. The most common case of this is the current directory --- every process has its own current directory. You can see this by starting a new zsh:

  % pwd               # show the current directory
  ~
  % zsh               # start a new shell, which 
                      # is a separate process
  % cd tmp
  % pwd               # now I'm in a different
                      # directory...
  ~/tmp
  % exit              # leave the new shell...
  % pwd               # now I'm back where I was...
  ~

Hence the cd command must be a shell builtin, or this would happen every time you ran it.

Here's a more useful example. Putting parentheses around a command asks the shell to start a different process for it. That's useful when you specifically don't want the effects propagating back:

  (cd some-other-dir; run-some-command)

runs the command, but doesn't change the directory the `real' shell is in, only its forked-off `subshell'. Hence,

  % pwd
  ~
  % (cd /; pwd)
  /
  % pwd
  ~

There's a more subtle case:

  cd some-other-dir | print Hello

Remember, the `|' (`pipe') connects the output of the first command to the input of the next --- though actually no information is passed that way in this example. In zsh, all but the last portion of the `pipeline' thus created is run in different processes. Hence the cd doesn't affect the main shell. I'll refer to it as the `parent' shell, which is the standard UNIX language for processes; when you start another command or fork off a subshell, you are creating `children' (without meaning to be morbid, the children usually die first in this case). Thus, as you would guess,

  print Hello | cd some-other-dir

does have the effect of changing the directory. Note that other shells do this differently; it is always guaranteed to work this way in zsh, because many people rely on it for setting parameters, but many shells have the left hand of the pipeline being the bit that runs in the parent shell. If both sides of the pipe symbol are external commands of some sort, both will of course run in subprocesses.

There are other ways you change the state of the shell, for example by declaring parameters of a particular type, or by telling it how to interpret certain commands, or, of course, by changing options. Here are the most useful, grouped in a vaguely logical fashion.

3.2.4: cd and friends

You will not by now be surprised to learn that the `cd' command changes directory. There is a synonym, `chdir', which as far as I know no-one ever uses. (It's the same name as the system call, so if you had been programming in C or Perl and forgot that you were now using the shell, you might use `chdir'. But that seems a bit far-fetched.)

There are various extra features built into cd and chdir. First, if you miss out the directory to which you want to change, you will be taken to your home directory, although it's not as if `cd ~' is all that hard to type.

Next, the command `cd -' is special: it takes you to the last directory you were in. If you do a sequence of cd commands, only the immediately preceding directory is remembered; they are not stacked up.

Thirdly, there is a shortcut for changing between similarly named directories. If you type `cd <old> <new>', then the shell will look for the first occurrence of the string `<old>' in the current directory, and try to replace it with `<new>'. For example,

  % pwd
  ~/src/zsh-3.0.8/Src
  % cd 0.8 1.9
  ~/src/zsh-3.1.9/Src

The cd command actually reported the new directory, as it usually does if it's not entirely obvious where it's taken you.

Note that only the first match of <old> is taken. It's an easy mistake to think you can change from /home/export1/pws/mydir1/something to /home/export1/pws/mydir2/something with `cd 1 2', but that first `1' messes it up. Arguably the shell could be smarter here. Of course, `cd r1 r2' will work in this case.

cd's friend `pwd' (print working directory) tells you what the current working directory is; this information is also available in the shell parameter $PWD, which is special and automatically updated when the directory changes. Later, when you know all about expansion, you will find that you can do tricks with this to refer to other directories. For example, ${PWD/old/new} uses the parameter substitution mechanism to refer to a different directory with old replaced by new --- and this time old can be a pattern, i.e. something with wildcard matches in it. So if you are in the zsh-3.0.8/Src directory as above and want to copy a file from the zsh-3.1.9/Src directory, you have a shorthand:

  cp ${PWD/0.8/1.9}/myfile.c .

Symbolic links

Zsh tries to track directories across symbolic links. If you're not familiar with these, you can think of them as a filename which behaves like a pointer to another file (a little like Windows' shortcuts, though UNIX has had them for much longer and they work better). You create them like this (ln is not a builtin command, but its use to make symbolic links is very standard these days):

  ln -s existing-file-name name-of-link

for example

  ln -s /usr/bin/ln ln

creates a file called ln in the current directory which does nothing but point to the file /usr/bin/ln. Symbolic links are very good at behaving as much like the original file as you usually want; for example, you can run the ln link you've just created as if it were /usr/bin/ln. They show up differently in a long file listing with `ls -l', the last column showing the file they point to.

You can make them point to any sort of file at all, including directories, and that is why they are mentioned here. Suppose you create a symbolic link from your home directory to the root directory and change into it:

  ln -s / ~/mylink
  cd ~/mylink

If you don't know it's a link, you expect to be able to change to the parent directory by doing `cd ..'. However, the operating system --- which just has one set of directories starting from / and going down, and ignores symbolic links after it has followed them, they really are just pointers --- thinks you are in the root directory /. This can be confusing. Hence zsh tries to keep track of where you probably think you are, rather than where the system does. If you type `pwd', you will see `/home/you/mylink' (wherever your home directory is), not `/'; if you type `cd ..', you will find yourself back in your home directory.

You can turn all this second-guessing off by setting the option CHASE_LINKS; then `cd ~/mydir; pwd' will show you to be in /, where changing to the parent directory has no effect; the parent of the root directory is the root directory, except on certain slightly psychedelic networked file systems. This does have advantages: for example, `cd ~/mydir; ls ..' always lists the root directory, not your home directory, regardless of the option setting, because ls doesn't know about the links you followed, only zsh does, and it treats the .. as referring to the root directory. Having CHASE_LINKS set allows `pwd' to warn you about where the system thinks you are.

An aside for non-UNIX-experts (over 99.9% of the population of the world at the last count): I said `symbolic links' instead of just `links' because there are others called `hard links'. This is what `ln' creates if you don't use the -s option. A hard link is not so much a pointer to a file as an alternative name for a file. If you do

  ln myfile othername
  ls -l

where myfile already exists you can't tell which of myfile and othername is the original --- and in fact the system doesn't care. You can remove either, and the other will be perfectly happy as the name for the file. This is pretty much how renaming files works, except that creating the hard link is done for you in that case. Hard links have limitations --- you can't link to directories, or to a file on another disk partition (and if you don't know what a disk partition is, you'll see what a limitation that can be). Furthermore, you usually want to know which is the original and which is the link --- so for most users, creating symbolic links is more useful. The only drawback is that following the pointers is a tiny bit slower; if you think you can notice the difference, you definitely ought to slow down a bit.

The target of a symbolic link, unlike a hard link, doesn't actually have to exist and no checking is performed until you try to use the link. The best thing to do is to run `ls -lL' when you create the link; the -L part tells ls to follow links, and if it worked you should see that your link is shown as having exactly the same characteristics as the file it points to. If it is still shown as a link, there was no such file.

While I'm at it, I should point out one slight oddity with symbolic links: the name of the file linked to (the first name), if it is not an absolute path (beginning with / after any ~ expansion), is treated relative to the directory where the link is created --- not the current directory when you run ln. Here:

  ln -s ../mydir ~/links/otherdir

the link otherdir will refer to mydir in its own parent directory, i.e. ~/links --- not, as you might think, the parent of the directory where you were when you ran the command. What makes it worse is that the second word, if is not an absolute path, is interpreted relative to the directory where you ran the command.

$cdpath and AUTO_CD

We're nowhere near the end of the magic you can do with directories yet (and, in fact, I haven't even got to the zsh-specific parts). The next trick is $cdpath and $CDPATH. They look a lot like $path and $PATH which you met in the last chapter, and I mentioned them briefly back in the last chapter in that context: $cdpath is an array of directories, while $CDPATH is colon-separated list behaving otherwise like a scalar variable. They give a list of directories whose subdirectories you may want to change into. If you use a normal cd command (i.e. in the form `cd dirname', and dirname does not begin with a / or ~, the shell will look through the directories in $cdpath to find one which contains the subdirectory dirname. If $cdpath isn't set, as you'd guess, it just uses the current directory.

Note that $cdpath is always searched in order, and you can put a . in it to represent the current directory. If you do, the current directory will always be searched at that point, not necessarily first, which may not be what you expect. For example, let's set up some directories:

  mkdir ~/crick ~/crick/dna
  mkdir ~/watson ~/watson/dna
  cdpath=(~/crick .)
  cd ~/watson
  cd dna

So I've moved to the directory ~/watson, which contains the subdirectory dna, and done `cd dna'. But because of $cdpath, the shell will look first in ~/crick, and find the dna there, and take you to that copy of the self-reproducing directory, not the one in ~/watson. Most people have . at the start of their cdpath for that reason. However, at least cd warns you --- if you tried it, you will see that it prints the name of the directory it's picked in cases like this.

In fact, if you don't have . in your directory at all, the shell will always look there first; there's no way of making cd never change to a subdirectory of the current one, short of turning cd into a function. Some shells don't do this; they use the directories in $cdpath, and only those.

There's yet another shorthand, this time specific to zsh: the option AUTO_CD which I mentioned in the last chapter. That way a command without any arguments which is really a directory will take you to that directory. Normally that's perfect --- you would just get a `command not found' message otherwise, and you might as well make use of the option. Just occasionally, however, the name of a directory clashes with the name of a command, builtin or external, or a shell function, and then there can be some confusion: zsh will always pick the command as long as it knows about it, but there are cases where it doesn't, as I described above.

What I didn't say in the last chapter is that AUTO_CD respects $cdpath; in fact, it really is implemented so that `dirname' on its own behaves as much like `cd dirname' as is possible without tying the shell's insides into knots.

The directory stack

One very useful facility that zsh inherited from the C-shell family (traditional Korn shell doesn't have it) is the directory stack. This is a list of directories you have recently been in. If you use the command `pushd' instead of `cd', e.g. `pushd dirname', then the directory you are in is saved in this list, and you are taken to dirname, using $CDPATH just as cd does. Then when you type `popd', you are taken back to where you were. The list can be as long as you like; you can pushd any number of directories, and each popd will take you back through the list (this is how a `stack', or more precisely a `last-in-first-out' stack usually operates in computer jargon, hence the name `directory stack').

You can see the list --- which always starts with the current directory --- with the dirs command. So, for example:

  cd ~
  pushd ~/src
  pushd ~/zsh
  dirs

displays

  ~/zsh ~/src ~

and the next popd will take you back to ~/src. If you do it, you will see that pushd reports the list given by dirs automatically as it goes along; you can turn this off with the option PUSHD_SILENT, when you will have to rely on typing dirs explicitly.

In fact, a lot of the use of this comes not from using simple pushd and popd combinations, but from two other features. First, `pushd' on its own swaps the top two directories on the stack. Second, pushd with a numeric argument preceded by a `+' or `-' can take you to one of the other directories in the list. The command `dirs -v' tells you the numbers you need; 0 is the current directory. So if you get,

  0       ~/zsh
  1       ~/src
  2       ~

then `pushd +2' takes you to ~. (A little suspension of disbelief that I didn't just use AUTO_CD and type `..' is required here.) If you use a -, it counts from the other end of the list; -0 (with apologies to the numerate) is the last item, i.e. the same as ~ in this case. Some people are used to having the `-' and `+' arguments behave the other way around; the option PUSHD_MINUS exists for this.

Apart from PUSHD_SILENT and PUSHD_MINUS, there are a few other relevant options. Setting PUSHD_IGNORE_DUPS means that if you pushd to a directory which is already somewhere in the list, the duplicate entry will be silently removed. This is useful for most human operations --- however, if you are using pushd in a function or script to remember previous directories for a future matching popd, this can be dangerous and you probably want to turn it off locally inside the function.

AUTO_PUSHD means that any directory-changing command, including an auto-cd, is treated as a pushd command with the target directory as argument. Using this can make the directory stack get very long, and there is a parameter $DIRSTACKSIZE which you can set to specify a maximum length. The oldest entry (the highest number in the `dirs -v' listing) is automatically removed when this length is exceeded. There is no limit unless this is explicitly set.

The final pushd option is PUSHD_TO_HOME. This makes pushd on its own behave like cd on its own in that it takes you to your home directory, instead of swapping the top two directories. Normally a series of `pushd' commands works pretty much like a series of `cd -' commands, always taking you the directory you were in before, with the obvious difference that `cd -' doesn't consult the directory stack, it just remembers the previous directory automatically, and hence it can confuse pushd if you just use `cd -' instead.

There's one remaining subtlety with pushd, and that is what happens to the rest of the list when you bring a particular directory to the front with something like `pushd +2'. Normally the list is simply cycled, so the directories which were +3, and +4 are now right behind the new head of the list, while the two directories which were ahead of it get moved to the end. If the list before was:

  dir1  dir2  dir3  dir4

then after pushd +2 you get

  dir3  dir4  dir1 dir2

That behaviour changed during the lifetime of zsh, and some of us preferred the old behaviour, where that one directory was yanked to the front and the rest just closed the gap:

  # Old behaviour
  dir3  dir1  dir2  dir4

so that after a while you get a `greatest hits' group at the front of the list. If you like this behaviour too (I feel as if I'd need to have written papers on group theory to like the new behaviour) there is a function pushd supplied with the source code, although it's short enough to repeat here --- this is in the form for autoloading in the zsh fashion:

  # pushd function to emulate the old zsh behaviour.
  # With this, pushd +/-n lifts the selected element
  # to the top of the stack instead of cycling
  # the stack.

  emulate -R zsh
  setopt localoptions

  if [[ ARGC -eq 1 && "$1" == [+-]<-> ]] then
          setopt pushdignoredups
          builtin pushd ~$1
  else
          builtin pushd "$@"
  fi

The `&&' is a logical `and', requiring both tests to be true. The tests are that there is exactly one argument to the function, and that it has the form of a `+' or a `-' followed by any number (`<->' is a special zsh pattern to match any number, an extension of forms like `<1-100>' which matches any number in the range 1 to 100 inclusive).

Referring to other directories

Zsh has two ways of allowing you to refer to particular directories. They have in common that they begin with a ~ (in very old versions of zsh, the second form actually used an `=', but the current way is much more logical).

You will certainly be aware, because I've made a lot of use of it, that a `~' on its own or followed by a / refers to your own home directory. An extension of this --- again from the C-shell, although the Korn shell has it too in this case --- is that ~name can refer to the home directory of any user on the system. So if your user name is pws, then ~ and ~pws are the same directory.

Zsh has an extension to this; you can actually name your own directories. This was described in chapter 2, à propos of prompts, since that is the major use:

  host% PS1='%~? '
  ~? cd zsh/Src
  ~/zsh/Src? zsrc=$PWD
  ~/zsh/Src? echo ~zsrc
  /home/pws/zsh/Src
  ~zsrc?

Consult that chapter for the ways of forcing a parameter to be recognised as a named directory.

There's a slightly more sophisticated way of doing this directly:

  hash -d zsrc=~/zsh/Src

makes ~zsrc appear in prompts as before, and in this case there is no parameter $zsrc. This is the purist's way (although very few zsh users are purists). You can guess what `unhash -d zsrc' does; this works with directories named via parameters, too, but leaves the parameter itself alone.

It's possible to have a named directory with the same name as a user. In that case `~name' refers to the directory you named explicitly, and there is no easy way of getting name's home directory without removing the name you defined.

If you're using named directories with one of the cd-like commands or AUTO_CD, you can set the option CDABLEVARS which allows you to omit the leading ~; `cd zsrc' with this option would take you to ~zsrc. The name is a historical artifact and now a misnomer; it really is named directories, not parameters (i.e. variables), which are used.

The second way of referring to directories with ~'s is to use numbers instead of names: the numbers refer to directories in the directory stack. So if dirs -v gives you

  0       ~zsf
  1       ~src

then ~+1 and ~-0 (not very mathematical, but quite logical if you think about it) refer to ~src. In this case, unlike pushd arguments, you can omit the + and use ~1. The option PUSHD_MINUS is respected. You'll see this was used in the pushd function above: the trick was that ~+3, for example, refers to the same element as pushd +3, hence pushd ~+3 pushed that directory onto the front of the list. However, we set PUSHD_IGNORE_DUPS, so that the value in the old position was removed as well, giving us the effect we wanted of simply yanking the directory to the front with no trick cycling.

3.2.5: Command control and information commands

Various builtins exist which control how you access commands, and which show you information about the commands which can be run.

The first two are strictly speaking `precommand modifiers' rather than commands: that means that they go before a command line and modify its behaviour, rather than being commands in their own right. If you put `command' in front of a command line, the command word (the next one along) will be taken as the name of an external command, however it would normally be interpreted; likewise, if you put `builtin' in front, the shell will try to run the command as a builtin command. Normally, shell functions take precedence over builtins which take precedence over external commands. So, for example, if your printer control system has the command `enable' (as many System V versions do), which clashes with a builtin I am about to talk about, you can run `command enable lp' to enable a printer; otherwise, the builtin enable would have been run. Likewise, if you have defined cd to be a function, but this time want to call the normal builtin cd, you can say `builtin cd mydir'.

A common use for command is inside a shell function of the same name. Sometimes you want to enhance an ordinary command by sticking some extra stuff around it, then calling that command, so you write a shell function of the same name. To call the command itself inside the shell function, you use `command'. The following works, although it's obviously not all that useful as it stands:

  ls() {
    command ls "$[@]"
  }

so when you run `ls', it calls the function, which calls the real ls command, passing on the arguments you gave it.

You can gain longer lasting control over the commands which the shell will run with the `disable' and `enable' commands. The first normally takes builtin arguments; each such builtin will not be recognised by the shell until you give an `enable' command for it. So if you want to be able to run the external enable command and don't particularly care about the builtin version, `disable enable' (sorry if that's confusing) will do the trick. Ha, you're thinking, you can't run `enable enable'. That's correct: some time in the dim and distant past, builtin enable enable' would have worked, but currently it doesn't; this may change, if I remember to change it. You can list all disabled builtins with just `disable' on its own --- most of the builtins that do this sort of manipulation work like that.

You can manipulate other sets of commands with disable and enable by giving different options: aliases with the option -a, functions with -f, and reserved words with -r. The first two you probably know about, and I'll come to them anyway, but `reserved words' need describing. They are essentially builtin commands which have some special syntactic meaning to the shell, including some symbols such as `{' and `[['. They take precedence over everything else except aliases --- in fact, since they're syntactically special, the shell needs to know very early on that it has found a reserved word, it's no use just waiting until it tries to execute a command. For example, if the shell finds `[[' it needs to know that everything until `]]' must be treated as a test rather than as ordinary command arguments. Consequently, you wouldn't often want to disable a reserved word, since the shell wouldn't work properly. The most obvious reason why you might would be for compatibility with some other shell which didn't have one. You can get a complete list with:

  whence -wm '*' | grep reserved

which I'll explain below, since I'm coming to `whence'.

Furthermore, I tend to find that if I want to get rid of aliases or functions I use the commands `unalias' and `unfunction' to get rid of them permanently, since I always have the original definitions stored somewhere, so these two options may not be that useful either. Disabling builtins is definitely the most useful of the four possibilities for disable.

External commands have to be manipulated differently. The types given above are handled internally by the shell, so all it needs to do is remember what code to call. With external commands, the issue instead is how to find them. I mentioned rehash above, but didn't tell you that the hash command, which you've already seen with the -d option, can be used to tell the shell how to find an external command:

  hash foo=/path/to/foo

makes foo execute the command using the path shown (which doesn't even have to end in `foo'). This is rather like an alias --- most people would probably do this with an alias, in fact --- although a little faster, though you're unlikely to notice the difference. You can remove this with unhash. One gotcha here is that if the path is rehashed, either by calling rehash or when you alter $path, the entire hash table is emptied, including anything you put in in this way; so it's not particularly useful.

In the midst of all this, it's useful to be able to find out what the shell thinks a particular command name does. The command `whence' tells you this; it also exists, with slightly different options, under the names where, which and type, largely to provide compatibility with other shells. I'll just stick to whence.

Its standard output isn't actually sparklingly interesting. If it's a command somehow known to the shell internally, it gets echoed back, with the alias expanded if it was an alias; if it's an external command it's printed with the full path, showing where it came from; and if it's not known the command returns status 1 and prints nothing.

You can make it more useful with the -v or -c options, which are more verbose; the first prints out an information message, while the second prints out the definitions of any functions it was asked about (this is also the effect of using `which' instead of `whence). A very useful option is -m, which takes any arguments as patterns using the usual zsh pattern format, in other words the same one used for matching files. Thus

  whence -vm "*"

prints out every command the shell knows about, together with what it thinks of it.

Note the quotes around the `*' --- you have to remember these anywhere where the pattern is not to be used to generate filenames on the command line, but instead needs to be passed to the command to be interpreted. If this seems a rather subtle distinction, think about what would happen if you ran

  # Oops.  Better not try this at home.
  # (Even better, don't do it at work either.)
  whence -vm *

in a directory with the files `foo' and (guess what) `bar' in it. The shell hasn't decided what command it's going to run when it first looks at the command line; it just sees the `*' and expands the line to

  whence -vm foo bar

which isn't what you meant.

There are a couple of other tricks worth mentioning: -p makes the shell search your path for them, even if the name is matched as something else (say, a shell function). So if you have ls defined as a function,

  which -p ls

will still tell what `command ls' would find. Also, the option -a searches for all commands; in the same example, this would show you both the ls command and the ls function, whereas whence would normally only show the function because that's the one that would be run. The -a option also shows if it finds more than one external command in your path.

Finally, the option -w is useful because it identifies the type of a command with a single word: alias, builtin, command, function, hashed, reserved or none. Most of those are obvious, with command being an ordinary external command; hashed is an external command which has been explicitly given a path with the hash builtin, and none means it wasn't recognised as a command at all. Now you know how we extracted the reserved words above.

A close relative of whence is functions, which applies, of course, to shell functions; it usually lists the definitions of all functions given as arguments, but its relatives (of which autoload is one) perform various other tricks, to be described in the section on shell functions below. Be careful with function, without the `s', which is completely different and not like command or builtin --- it is actually a keyword used to define a function.

3.2.6: Parameter control

There are various builtins for controlling the shells parameters. You already know how to set and use parameters, but it's a good deal more complicated than that when you look at the details.

Local parameters

The principal command for manipulating the behaviour of parameters is `typeset'. Its easiest usage is to declare a parameter; you just give it a list of parameter names, which are created as scalar parameters. You can create parameters just by assigning to them, but the major point of `typeset' is that if a parameter is created that way inside a function, the parameter is restored to its original value, or removed if it didn't previously exist, at the end of the function --- in other words, it has `local scope' like the variables which you declare in most ordinary programming languages. In fact, to use the jargon it has `dynamical' rather than `syntactic' scope, which means that the same parameter is visible in any function called within the current one; this is different from, say, C or FORTRAN where any function or subroutine called wouldn't see any variable declared in the parent function.

The following makes this more concrete.

  var='Original value'
  subfn() {
    print $var
  }
  fn() {
    print $var
    typeset var='Value in function'
    print $var
    subfn
  }
  fn
  print $var

This chunk of code prints out

  Original value
  Value in function
  Value in function
  Original value

The first three chunks of the code just define the parameter $var, and two functions, subfn and fn. Then we call fn. The first thing this does is print out $var, which gives `Original value' since we haven't changed the original definition. However, the typeset next does that; as you see, we can assign to the parameter during the typeset. Thus when we print $var out again, we get `Value in function'. Then subfn is called, which prints out the same value as in fn, because we haven't changed it --- this is where C or FORTRAN would differ, and wouldn't recognise the variable because it hadn't been declared in that function. Finally, fn exits and the original value is restored, and is printed out by the final `print'.

Note the value changes twice: first at the typeset, then again at the end of fn. The value of $var at any point will be one of those two values.

Although you can do assignments in a typeset statement, you can't assign to arrays (I already said this in the last chapter):

  typeset var=(Doesn\'t work\!)

because the syntax with the parentheses is special; it only works when the line consists of nothing but assignments. However, the shell doesn't complain if you try to assign an array to a scalar, or vice versa; it just silently converts the type:

  typeset var='scalar value'
  var=(array value)

I put in the assignment in the typeset statement to rub the point in that it creates scalars, but actually the usual way of setting up an array in a function is

  typeset var
  var=()

which creates an empty scalar, then converts that to an empty array. Recent versions of the shell have `typeset -a var' to do that in one go --- but you still can't assign to it in the same statement.

There are other catches associated with the fact that typeset and its relatives are just ordinary commands with ordinary sets of arguments. Consider this:

  % typeset var=`echo two words`
  % print $var
  two

What has happened to the `words'? The answer is that backquote substitution, to be discussed below, splits words when not quoted. So the typeset statement is equivalent to

  % typeset var=two words

There are two ways to get round this; first, use an ordinary assignment:

  % typeset var
  % var=`echo two words`

which can tell a scalar assignment, and hence knows not to split words, or quote the backquotes,

  % typeset var="`echo two words`"

There are three important types we haven't talked about; both of these can only be created with typeset or one of the similar builtins I'll list in a moment. They are integer types, floating point types, and associative array types.

Numeric parameters

Integers are created with `typeset -i', or `integer' which is another way of saying the same thing. They are used for arithmetic, which the shell can do as follows:

  integer i
  (( i = 3 * 2 + 1 ))

The double parentheses surround a complete arithmetic expression: it behaves as if it's quoted. The expression inside can be pretty much anything you might be used to from arithmetic in other programming languages. One important point to note is that parameters don't need to have the $ in front, even when their value is being taken:

  integer i j=12
  (( i = 3 * ( j + 4 ) ** 2 ))

Here, j will be replaced by 12 and $i gets the value 768 (sixteen squared times three). One thing you might not recognise is the **, which is the `to the power of' operator which occurs in FORTRAN and Perl. Note that it's fine to have parentheses inside the double parentheses --- indeed, you can even do

  (( i = (3 * ( j + 4 )) ** 2 ))

and the shell won't get confused because it knows that any parentheses inside must be in balanced pairs (until you deliberately confuse it with your buggy code).

You would normally use `print $i' to see what value had been given to $i, of course, and as you would expect it gets printed out as a decimal number. However, typeset allows you to specify another base for printing out. If you do

  typeset -i 16 i
  print $i

after the last calculation, you should see 16#900, which means 900 in base 16 (hexadecimal). That's the only effect the option `-i 16' has on $i --- you can assign to it and use it in arithmetical expressions just as normal, but when you print it out it appears in this form. You can use this base notation for inputting numbers, too:

  (( i = 16#ff * 2#10 ))

which means 255 (ff in hexadecimal) times 2 (10 in binary). The shell understands C notation too, so `16#ff' could have been expressed `0xff'.

Floating point variables are very similar. You can declare them with `typeset -F' or `typeset -E'. The only difference between the two is, again, on output; -F uses a fixed point notation, while -E uses scientific (mnemonic: exponential) notation. The builtin `float' is equivalent to `typeset -E' (because Korn shell does it, that's why). Floating point expressions also work the way you are probably used to:

  typeset -E e
  typeset -F f
  (( e = 32/3, f = 32.0/3.0 ))
  print $e $f

prints

  1.000000000e+01 10.6666666667

Various points: the `,' can separate different expressions, just like in C, so the e and f assignments are performed separately. The e assignment was actually an integer division, because neither 32 nor 3 is a floating point number, which must contain a dot. That means an integer division was done, producing 10, which was then converted to a floating point number only at the end. Again, this is just how grown-up languages work, so it's no use cursing. The f assignment was a full floating point performance. Floating point parameters weren't available before version 3.1.7.

Although this is really a matter for a later chapter, there is a library of floating point functions you can load (actually it's just a way of linking in the system mathematical library). The usual incantation is `zmodload zsh/mathfunc'; you may not have `dynamic loading' of libraries on your system, which may mean that doesn't work. If it does, you can do things like

  (( pi = 4.0 * atan(1.0) ))

Broadly, all the functions which appear in most system mathematical libraries (see the manual page for math) are available in zsh.

Like all other parameters created with typeset or one of its cousins, integer and floating point parameters are local to functions. You may wonder how to create a global parameter (i.e. one which is valid outside as well as inside the function) which has an integer or floating point value. There's a recent addition to the shell (in version 3.1.6) which allows this: use the flag -g to typeset along with any others. For example,

  fn() {
    typeset -Fg f
    (( f = 42.75 ))
  }
  fn
  print $f

If you try it, you will see the value of $f has survived beyond the function. The g stands for global, obviously, although it's not quite that simple:

  fn() {
    typeset -Fg f
  }
  outerfn() {
    typeset f='scalar value'
    fn
    print $f
  }
  outerfn

The function outerfn creates a local scalar value for f; that's what fn sees. So it was not really operating on a `global' value, it just didn't create a new one for the scope of fn. The error message comes because it tried to preserve the value of $f while changing its type, and the value wasn't a proper floating point expression. The error message,

  fn: bad math expression: operator expected at `value'

comes about because assigning to numeric parameters always does an arithmetic evaluation. Operating on `scalar value' it found `scalar' and assumed this was a parameter, then looked for an operator like `+' to come next; instead it found `value'. If you want to experiment, change the string to `scalar + value' and set `value=42', or whatever, then try again. This is a little confusing (which is a roundabout way of saying it confused me), but consistent with how zsh usually treats parameters.

Actually, to a certain extent you don't need to use the integer and floating point parameters. Any time zsh needs a numeric expression it will force a scalar to the right value, and any time it produces a numeric expression and assigns it to a scalar, it will convert the result to a string. So

  typeset num=3            # This is the *string* `3'.
  (( num = num + 1 ))      # But this works anyway
                           # ($num is still a string).

This can be useful if you have a parameter which is sometimes a number, sometimes a string, since zsh does all the conversion work for you. However, it can also be confusing if you always want a number, because zsh can't guess that for you; plus it's a little more efficient not to have to convert back and forth; plus you lose accuracy when you do, because if the number is stored as a string rather than in the internal numeric representation, what you say is what you get (although zsh tends to give you quite a lot of decimal places when converting implicitly to strings). Anyway, I'd recommend that if you know a parameter has to be an integer or floating point value you should declare it as such.

There is a builtin called let to handle mathematical expressions, but since

  let "num = num + 1"

is equivalent to

  (( num = num + 1 ))

and the second form is easier and more memorable, you probably won't need to use it. If you do, remember that (unlike BASIC) each mathematical expression should appear as one argument in quotes.

Associative arrays

The one remaining major type of parameter is the associative array; if you use Perl, you may call it a `hash', but we tend not to since that's really a description of how it's implemented rather than what it does. (All right, what it does is hash things. Now shut up.)

These have to be declared by a typeset statement --- there's no getting round it. There are some quite eclectic builtins that produce a filled-in associative array for you, but the only way to tell zsh you want your very own associative array is

  typeset -A assoc

to create $assoc. As to what it does, that's best shown by example:

  typeset -A assoc
  assoc=(one eins two zwei three drei)
  print ${assoc[two]}

which prints `zwei'. So it works a bit like an ordinary array, but the numeric subscript of an ordinary array which would have appeared inside the square bracket is replaced by the string key, in this case two. The array assignment was a bit deceptive; the `values' were actually pairs, with `one' being the key for the value `eins', and so on. The shell will complain if there are an odd number of elements in such a list. This may also be familiar from Perl. You can assign values one at a time:

  assoc[four]=vier

and also unset one key/value pair:

  unset 'assoc[one]'

where the quotes stop the square brackets from being interpreted as a pattern on the command line.

Expansion has been held over, but you might like to know about the ways of getting back what you put in. If you do

  print $assoc

you just see the values --- that's exactly the same as with an ordinary array, where the subscripts 1, 2, 3, etc. aren't shown. Note they are in random order --- that's the other main difference from ordinary arrays; associative arrays have no notion of an order unless you explicitly sort them.

But here the keys may be just as interesting. So there is:

  print ${(k)assoc}
  print ${(kv)assoc}

giving (if you've followed through all the commands above):

  four two three
  four vier two zwei three drei

which print out the keys instead of the values, and the key and value pairs much as you entered them. You can see that, although the order of the pairs isn't obvious, it's the same each time. From this example you can work out how to copy an associative array into another one:

  typeset -A newass
  newass=(${(kv)assoc})

where the `(kv)' is important --- as is the typeset just before the assignment, otherwise $newass would be a badass ordinary array. You can also prove that ${(v)assoc} does what you would probably expect. There are lots of other tricks, but they are mostly associated with clever types of parameter expansion, to be described in chapter 5.

Other typeset and type tricks

There are variants of typeset, some mentioned sporadically above. There is nothing you can do with any of them that you can't do with typeset --- that wasn't always the case; we've tried to improve the orthogonality of the options. They differ in the options which are set by default, and the additional options which are allowed. Here's a list: declare, export, float, integer, local, readonly. I won't confuse you by describing all in detail; see the manual.

If there is an odd one out, it's export, which not only marks a parameter for export but has the -g flag turned on by default, so that that parameter is not local to the function; in other words, it's equivalent to typeset -gx. However, one holdover from the days when the options weren't quite so logical is that typeset -x behaves like export, in other words the -g flag is turned on by default. You can fix this by unsetting the option GLOBAL_EXPORT --- the option only exists for compatibility; logically it should always be unset. This is partly because in the old days you couldn't export local parameters, so typeset -x either had to turn on -g or turn off -x; that was fixed for the 3.1.9 release, and (for example) `local -x' creates a local parameter which is exported to the environment; both the parameter itself, and the value in the environment, will be restored when the function exits. The builtin local is essentially a form of typeset which renounces the -g flag and all its works.

Another old restriction which has gone is that you couldn't make special parameters, in particular $PATH, local to a function; you just modified the original parameter. Now if you say `typeset PATH', things happen the way you probably expect, with $PATH having its usual effect, and being restored to its old value when the function exits. Since $PATH is still special, though, you should make sure you assign something to it in the function before calling external commands, else it will be empty and no commands will be found. It's possible that you specifically don't want some parameter you make local to have the special property; 3.1.7 and after allow the typeset flag -h to hide the specialness for that parameter, so in `typeset -h PATH', PATH would be an ordinary variable for the duration of the enclosing function. Internally, the same value as was previously set would continue to be used for finding commands, but it wouldn't be exported.

The second main use of typeset is to set attributes for the parameters. In this case it can operate on an existing parameter, as well as creating a new one. For example,

  typeset -r msg='This is an important message.'

sets the readonly flag (-r) for the parameter msg. If the parameter didn't exist, it would be created with the usual scoping rules; but if it did exist at the current level of scoping, it would be made readonly with the value assigned to it, meaning you can't set that particular copy of the parameter. For obvious reasons, it's normal to assign a value to a readonly parameter when you first declare it. Here's a reality check on how this affects scoping:

   msg='This is an ordinary parameter'
   fn() {
     typeset msg='This is a local ordinary parameter'
     print $msg
     typeset -r msg='This is a local readonly parameter'
     print $msg
     msg='Watch me cause an error.'
   }
   fn
   print $msg
   msg='This version of the parameter'\ 
   ' can still be overwritten'
   print $msg

outputs

  This is a local ordinary parameter
  This is a local readonly parameter
  fn:5: read-only variable: msg
  This is an ordinary parameter
  This version of the parameter can still be overwritten

Unfortunately there was a bug with this code until recently --- thirty seconds ago, actually: the second typeset in fn incorrectly added the readonly flag to the existing msg before attempting to set the new value, which was wrong and inconsistent with what happens if you create a new local parameter. Maybe it's reassuring that the shell can get confused about local parameters, too. (I don't find it reassuring in the slightest, since typeset is one of the parts of the code where I tend to fix the bugs, but maybe you do.)

Anyway, when the bug is fixed, you should get the output shown, because the first typeset created a local variable which the second typeset made readonly, so that the final assignment caused an error. Then the $msg in the function went out of scope, and the ordinary parameter, with no readonly restriction, was visible again.

I mentioned another special typeset option in the previous chapter:

  typeset -T TEXINPUTS texinputs

to tie together the scalar $TEXINPUTS and the array $texinputs in the same way that $PATH and $path work. This is a one-off; it's the only time typeset takes exactly two parameter names on the command line. All other uses of typeset take a list of parameters to which any flags given are applied. See the manual for the remaining flags, although most of the more interesting ones have been discussed.

The other thing you need to know about flags is that you use them with a `+' sign to turn off the corresponding attribute. So

  typeset +r msg

allows you to set $msg again. From version 4.1, you won't be able to turn off the readonly attribute for a special parameter; that's because there's too much scope for confusion, including attempting to set constant strings in the code. For example, `$ZSH_VERSION' always prints a fixed string; attempting to change that is futile.

The final use of typeset is to list parameters. If you type `typeset' on its own, you get a complete list of parameters and their values. From 3.1.7, you can turn on the flag -H for a parameter, which means to hide its value while you're doing this. This can be useful for some of the more enormous parameters, particularly special parameters which I'll talk about in the section in chapter 7 on modules, which tend to swamp the display typeset produces.

You can also list parameters of a particular type, by listing the flags you want to know about. For example,

  typeset -r

lists all readonly parameters. You might expect `typeset +r' to list parameters which don't have that attribute, but actually it lists the same parameters but without showing their value. `typeset +' lists all parameters in this way.

Another good way of finding out about parameters is to use the special expansion `${(t)param}', for example

  print ${(t)PATH}

prints `scalar-export-special': $PATH is a scalar parameter, with the -x flag set, and has a special meaning to the shell. Actually, `special' means something a bit more than that: it means the internal code to get and set the parameter behaves in a way which has side effects, either to the parameter itself or elsewhere in the shell. There are other parameters, like $HISTFILE, which are used by the shell, but which are get and set in a normal way --- they are only special in that the value is looked at by the shell; and, after all, any old shell function can do that, too. Contrast this with $PATH which has all that paraphernalia to do with hashing commands to take care of when it's set, as I discussed above, and I hope you'll see the difference.

Reading into parameters

The `read' builtin, as its name suggests, is the opposite to `print' (there's no `write' command in the shell, though there is often an external command of that name to send a message to another user), but reading, unlike printing, requires something in the shell to change to take the value, so unlike print, read is forced to be a builtin. Inevitably, the values are read into a parameter. Normally they are taken from standard input, very often the terminal (even if you're running a script, unless you redirected the input). So the simplest case is just

  read param

and if you type a line, and hit return, it will be put into $param, without the final newline.

The read builtin actually does a bit of processing on the input. It will usually strip any initial or final whitespace (spaces or tabs) from the line read in, though any in the middle are kept. You can read a set of values separated by whitespace just by listing the parameters to assign them to; the last parameter gets all the remainder of the line without it being split. Very often it's easiest just to read into an array:

  % read -A array
        this is a line typed in now, \ 
      by me,    in this   space
  % print ${array[1]} ${array[12]}
  this space

(I'm assuming you're using the native zsh array format, rather than the one set with KSH_ARRAYS, and shall continue to assume this.)

It's useful to be able to print a prompt when you want to read something. You can do this with `print -n', but there's a shorthand:

  % read line'?Please enter a line: '
  Please enter a line: some words
  % print $line
  some words

Note the quotes surround the `?' to prevent it being taken as part of a pattern on the command line. You can quote the whole expression from the beginning of `line', if you like; I just write it like that because I know parameter names don't need quoting, because they can't have funny characters in. It's almost logical.

Another useful trick with read is to read a single character; the `-k' option does this, and in fact you can stick a number immediately after the `k' which specifies a number to read. Even easier, the `-q' option reads a single character and returns status 0 if it was y or Y, and status 1 otherwise; thus you can read the answer to yes/no questions without using a parameter at all. Note, however, that if you don't supply a parameter, the reply gets assigned in any case to $REPLY if it's a scalar --- as it is with -q --- or $reply if it's an array --- i.e. if you specify -A, but no parameter name. These are more examples of the non-special parameters which the shell uses --- it sets $REPLY or $reply, but only in the same way you would set them; there are no side-effects.

Like print, read has a -r flag for raw mode. However, this just has one effect for read: without it, a \ at the end of the line specifies that the next line is a continuation of the current one (you can do this when you're typing at the terminal). With it, \ is not treated specially.

Finally, a more sophisticated note about word-splitting. I said that, when you are reading to many parameters or an array, the word is split on whitespace. In fact the shell splits words on any of the characters found in the (genuinely special, because it affects the shell's guts) parameter $IFS, which stands for `input field separator'. By default --- and in the vast majority of uses --- it contains space, tab, newline and a null character (character zero: if you know that these are usually used to mark the end of strings, you might be surprised the shell handles these as ordinary characters, but it does, although printing them out usually doesn't show anything). However, you can set it to any string: enter

  fn() {
    local IFS=:
    read -A array
    print -l $array
  }
  fn

and type

one word:two words:three words:four

The shell will show you what's in the array it's read, one `word' per line:

  one word
  two words
  three words
  four

You'll see the bananas, er, words (joke for the over-thirties) have been treated as separated by a colon, not by whitespace. Making $IFS local didn't work in old versions of zsh, as with other specials; you had to save it and restore it.

The read command in zsh doesn't let you do line editing, which some shells do. For that, you should use the vared command, which runs the line editor to edit a parameter, with the -c option, which allows vared to create a new parameter. It also takes the option -p to specify a prompt, so one of the examples above can be rewritten

  vared -c -p 'Please enter a line: ' line

which works rather like read but with full editing support. If you give the option -h (history), you can even retrieve values from previous command lines. It doesn't have all the formatting options of read, however, although when reading an array (use the option -a with -c if creating a new array) it will perform splitting.

Other builtins to control parameters

The remaining builtins which handle parameters can be dealt with more swiftly.

The builtin set simply sets the special parameter which is passed as an argument to functions or scripts, and which you access as $* or $@, or $<number> (Bourne-like format), or via $argv (csh-like format), known however you set them as the `positional parameters':

  % set a whole load of words
  % print $1
  a
  % print $*
  a whole load of words
  % print $argv[2,-2]
  whole load of

It's exactly as if you were in a function and had called the function with the arguments `a whole load of words'. Actually, set can also be used to set shell options, either as flags, e.g. `set -x', or as words after `-o' , e.g. `set -o xtrace' does the same as the previous example. It's generally easier to use setopt, and the upshot is that you need to be careful when setting arguments this way in case they begin with a `-'. Putting `-``-' before the real arguments fixes this.

One other use of set is to set any array, via

  set -A any_array words to assign to any_array

which is equivalent to (and the standard Korn shell version of)

  any_array=(words to assign to any_array)

One case where the set version is more useful is if the name of an array itself comes from a parameter:

  arrname=myarray
  set -A $arrname words to assign

has no easy equivalent in the other form; the left hand side of an ordinary assignment won't expand a parameter:

  # Doesn't work; syntax error
  $arrname=(words to assign)

This worked in old versions of zsh, but that was on the non-standard side. The eval command, described below, gives another way around this.

Next comes `shift', which simply moves an array up one element, deleting the original first one. Without an array name, it operates on the positional parameters. You can also give it a number to shift other than one, before the array name.

  shift array

is equivalent to

  array=(${array[2,-1]})

(almost --- I'll leave the subtleties here for the chapter on expansion) which picks the second to last elements of the array and assigns them back to the original array. Note, yet again, that shift operates using the name, not the value of the array, so no `$' should appear in front, otherwise you get something similar to the trick I showed for `set -A'.

Finally, unset unsets a parameter, and I already showed you could unset a key/value pair of an associative array. There is one subtlety to be mentioned here. Normally, unset just makes the parameter named disappear off the face of the earth. However, if you call unset in a function, its ghost lives on in the sense that any parameter you create in the same name will be scoped as the original parameter was. Hence:

  var='global value'
  fn() {
    typeset var='local value'
    unset var
    var='what about this?'
  }
  fn
  print $var

The final statement prints `global value': even though the local copy of $var was unset, the shell remembers that it was local, so the second $var in the function is also local and its value disappears at the end of the function.

3.2.7: History control commands

The easiest way to access the shell's command history is by editing it directly. The second easiest way is to use the `!'-history mechanism. Other ways of manipulating it are based around the fc builtin, which probably once stood for something (according to Oliver Kiddle, `fix command', which is as good as anything). I talked quite a bit about it in the last chapter, and don't really have anything to add. Just note that the two other commands based around it are history and r.

3.2.8: Job control and process control

One of the major contributions of the C-shell was job control. You need to know about foreground and background tasks, and again I introduced these in the last chapter along with the options that control them. Here is an introduction to the relevant builtins.

You start a background job in two ways. First, directly, by putting an `&' after it:

  sleep 10 &

and secondly by starting it in the normal way (i.e. in the foreground), then typing ^Z, and using the bg command to put it in the background. Between typing ^Z and bg, the job is still there, but is not running; it is `suspended' or `stopped' (systems use different descriptions for the same thing), waiting for you to decide what to do with it. In either case, the job then continues without the shell waiting for it. It will still try and read from or write to the terminal if that's how you started it; you need to use the shell's redirection facilities right at the start if you want to change that, there's nothing you can do after the job has already started.

By the way, `sleep' isn't a builtin. Oddly enough, you can suspend a builtin command or sequence of commands (such as shell function) with ^Z, although since the shell has to continue executing your commands as well as being suspended, it does the only thing it can do --- fork, so that the commands you suspend are put into the background. Probably you will only rarely do this with builtins. No other shell, so far as I know, has this feature.

A job will stop if it needs to read from the terminal. You see a message like:

  [1]  + 1348 suspended (tty input)  jobname and arguments

which means the job is suspended very much like you had just typed ^Z. You need to bring the job into the forground, as described below, so that you can type something to it.

By the way, the key to type to suspend a command may not be ^Z; it usually is, but that can be changed. Run `stty -a' and look for what is listed after `susp =' --- probably, but not necessarily, ^Z. So if you want to use another character --- it must be a single character; this is handled deep in the terminal interface, not in the shell --- you can run

  stty susp '^]'

or whatever. You will note from the stty output that various other job control characters can be changed similarly. The stty command is external and its format for both output and input can vary quite a bit from system to system.

Instead of putting the command into the background, you can bring it back to the foreground again with fg. This is useful for temporarily stopping what you are doing so you can do something else. These days you would probably do it in another window; in the old days when people logged in from simple terminals this was even more useful. A typical example of this is

  more file                        # look at file
  ^Z                               # suspend
  [1] + 8592 suspended  more file  # message printed
  ...                              # do something else
  fg %1                            # resume the `more'

The `%' is the usual way of referring to jobs. The number after it is what appeared in square brackets with the suspended message; I don't know why the shell doesn't use the `%' notation there, too. You also see that with the `continued' message when you put something into the background, and again at the end with the `done' message which tells you a background job is finished. The `%' can take other forms; the most common is to follow it by the name of a command, such as `%more' in this case. The forms %+ and %- refer to the most recent and second most recent jobs --- the `+' in the `suspended' message is telling you that the more job could be referred to like that.

Most of the job control commands will actually assume you are talking about `%+' if you don't give an argument, so assuming I hadn't started any other commands in the background, I could just have put `fg' at the end of the sequence of commands above. This actually cuts both ways: fg is the default operation on jobs referred to with the `%' notation, so just typing `%1' with no command name would have worked, too.

You can jog your memory about what's going on with the `jobs' command. It looks like a series of messages of the form beginning with the number in square brackets; usually the jobs will either be `running' or `suspended'. This will tell you the numbers you need.

One other useful thing you can do with a job is to tell the shell to forget about it. This is only really useful if it is already running in the background; then you can run `disown' with the job identifier. It's useful for jobs you want to continue after you've logged out, as well as jobs that have their own windows which you can therefore control directly. With disowned jobs, the shell doesn't warn you that they are still there when you log out. You can actually disown a background job when you start it by putting `&|' or `&!' at the end of the line instead of simply `&'. Note that if the job was suspended when you disowned it, it will stay disowned; this is pretty pointless, so you probably should run `bg' on it first.

The next most likely thing you want to do with a job is kill it, or maybe suspend it when it's already in the background and you can't just type ^Z. This is where the kill builtin comes in. There's more to this than there is to the builtins mentioned above. First, you can use kill with other processes that weren't started from the current shell. In that case, you would use a number to identify it, with no % --- that's why the %'s were there in the other cases. Of course, you need to find out the number; the usual way is with the ps command, which is not a builtin but which appears on all UNIX-like systems. As a stupid example, here I start a disowned process which does very little, look for it, then kill it:

  % sleep 60 &|
  % ps -f
  UID        PID  PPID  C STIME TTY          TIME CMD
  pws        623   614  0 22:12 pts/0    00:00:00 zsh
  pws       8613   623  0 23:12 pts/0    00:00:00 sleep 60
  pws       8615   623  0 23:12 pts/0    00:00:00 ps -f
  % kill 8613
  % ps -f
  UID        PID  PPID  C STIME TTY          TIME CMD
  pws        623   614  0 22:12 pts/0    00:00:00 zsh
  pws       8616   623  0 23:12 pts/0    00:00:00 ps -f

The process has disappeared the second time I look. Notice that in the usual lugubrious UNIX way the shell didn't bother to tell you the process had been killed; however, it will report an error if it failed to send it the signal. Sending it the signal is all the shell cares about; the shell won't warn if you if the process decided it didn't want to die when told to, so it's still a good idea to check.

Sometimes you want to wait for a process to exit; the wait builtin can do this, and like kill can take a process number as well as a job number. However, that's a bit deceptive --- you can't actually wait for a process which wasn't started directly from the shell. Indeed, the mechanism for waiting is all bound up with the way UNIX handles processes; unless its parent waits for it, a process becomes a `zombie' and hangs around until the system's foster parent, the `init' process (always process number 1) waits for it instead. It's all a little bit baroque, but for the shell user, wait just means you can hang on until something you started has finished. Indeed, that's how foreground processes work: the shell in effect uses the internal version of wait to hang around until the job exits. (Well, actually that's a lie; the system wakes it up from whatever it's doing to tell it a child has finished, so all it has to do is doze off to wait.)

Furthermore, you can wait for a process even if job control isn't running. Job control, basically anything involving those %'s, is only useful when you are sitting at a terminal fiddling with commands; it doesn't operate when you run scripts, say. Then the shell has much less freedom in how to control its jobs, but it can still wait for a background process, and it can still use kill on a process if it knows its number. For this purpose, the shell stores the ID of the last process started in the background in the parameter $!; there's probably a good reason for the `!', but I don't know what it is. This happens regardless of job control.

Signals

The kill command can do a good deal more than just kill a process. That is the default action, which is why the command has that name. But what it's really doing is sending a `signal' to a process. Signals are the simplest way of communicating to another process; in fact, they are about the only simple way if you haven't made special arrangements for the process to read messages from you. Signal names are written like SIGINT, SIGTSTP, SIGKILL; to send a particular signal to a process, you remove the SIG, stick a hyphen in front, and use that as the first argument to kill, e.g.:

  kill -KILL 8613

Some of the things you already know about are actually doing just that. When you type ^C to stop a process, you are actually sending it a SIGINT for `interrupt', as if you had done

  kill -INT 8613

The usual signal sent by kill is not, as you might have guessed, SIGKILL, but actually SIGTERM for `terminate'; SIGKILL is stronger as the process can't block that signal, as it can with many (we'll see how the shell can do that in a moment). It's familiar to UNIX hackers as `kill -9', because all the signals also have numbers. You can see the list of signals in zsh by doing:

  % print $signals
  EXIT HUP INT QUIT ILL TRAP ABRT BUS FPE KILL USR1
  SEGV USR2 PIPE ALRM TERM STKFLT CLD CONT STOP TSTP
  TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH POLL PWR
  UNUSED ZERR DEBUG

Your list will probably be different from mine; this is for Linux, and the list is very system-specific, even though the first nine are generally the same, and many of the others are virtually always present. Actually, SIGEXIT is an invention by the shell for you to allow the shell to do something when a function exits (see the section on `traps' below); you can't actually use `kill -EXIT'. Thus SIGHUP is the first real signal, and indeed that's number one, so you have to shift the contents of $signals along one to get the right numbers. SIGTERM and SIGINT usually have the same effect, stopping the process, unless that has decided to handle the signal some other way.

The last two signals are bogus, too: SIGZERR is to allow the shell to do something on an error (non-zero exit status), while with SIGDEBUG you can do it on every command. Again, the `something' to be executed is a `trap', as I'll discuss in a short while.

Typing ^Z to suspend a process actually sends the process a SIGTSTP (terminal stop, since it usually comes from the terminal), while SIGSTOP is similar but usually doesn't come from a terminal. Even restarting a process as with bg sends it a signal, in this case SIGCONT. It seems a bit odd to signal a process to restart; why can't the operating system just restart it when you ask? The real answer is probably that signals provide an easy way for you to talk to the operating system without grovelling around in the dirt too much.

Before I talk about how you make the shell handle signals it receives, there is one extra oddment: the suspend builtin effectively sends the shell a signal to suspend it, as if you'd typed ^Z, though as you've probably found by now that doesn't suspend the shell itself. It's only useful to do this if the shell is running under some other programme, else there's no way of restoring it and suspending is effectively the same as exiting the shell. For this reason, the shell won't let you call suspend in a login shell, because it assumes that is running as the top level (though in the previous chapter you learnt there's actually nothing that special about login shells; you can start one just with `zsh -l'). If you're logged in remotely via rsh or ssh, it's usually more convenient to use the keystrokes `~^Z' which those define, rather than zsh's mechanism; they have to be at the beginning of a line, so hit return first if necessary. This returns you to your local terminal; you can resume the remote login with `fg' just like any other programme.

Traps

The way of making the shell handle signals is called `traps'. There are actually two mechanisms for this. I'll present the more standard one and then talk about the advantages and drawbacks of the other one at the end.

The standard version (shared with other shells) is via the `trap' builtin. The first argument is a chunk of shell code to execute, which obviously needs to be quoted when you pass it as an argument, and the remaining arguments are a list of signals to handle, minus the SIG prefix. So:

  trap "echo I\\'m trapped." INT

tells the shell what to do on SIGINT, i.e. ^C. Note the extra layer of quoting: the double quotes surround the code, so that when they are stripped trap sees the chunk

  echo I\'m trapped

Usually the shell would abort what it was doing and return to the main prompt when you hit ^C. Now, however, it will simply print the message and carry on. You can try this, for example, with

  read line

If you hit ^C while it's waiting for input, you'll see the message go up, but the shell will still wait for you to type a line.

A warning about this: ^C is only trapped within the shell itself. If you start up an external programme, it will have its own mechanism for handling signals, and if it usually aborts on ^C it still will. But there's a sting in the tail: do

cat

which waits for input to output again (you need to use ^D to exit normally). If you type ^C here, the command will be aborted, as I said --- but you still get the message `I'm trapped'. That's because the shell is able to tell that the command got that particular signal, and calls the trap when the cat exits. Not all shells do this; furthermore, some commands which handle signals themselves won't give the shell enough information to know that a signal arrived, and in that case the trap won't be called. Such commands are usually the more sophisticated things like editors or screen managers or whatever; you just have to find out by trial and error.

You can also make the shell ignore the signal completely. To do this, the first argument should be an empty string:

  trap '' INT

Now ^C will have no effect, and this time the effect is passed on directly to commands called from the shell --- try the cat example and you won't be able to interrupt it; type ^D or use the lesser known but more powerful ^\ (control with backslash), which sends SIGQUIT. If it hasn't been disabled, this will also produce a file core, which contains debugging information about what the programme was doing when it exited --- never call your own files core. You can trap SIGQUIT too, if you want. (The shell itself usually ignores SIGQUIT; it's only useful for external commands.)

Now the other sort of trap. I could have written for the first example:

  TRAPINT() {
    print I\'m trapped.
  }

As you can see, this is just a function: functions beginning TRAP are special. However, it's a real function too; you can call it by hand with the command `TRAPINT', and it will run perfectly happily with no funny side effects.

There is a difference between the way the two types work. In the `trap' sort of trap, the code is just evaluated just as if it appeared as instructions to the shell at the point where the trap happened. So if you were in a function, you would see the environment of that function with its local variables; if you set a local variable with typeset, it would be visible in the function just as if it were created there.

However, in the function type of trap, the code is provided with its own function environment. Now if you use typeset the parameter created is local only to the trap. In most cases, that's all the difference there is; it's up to you to decide which is more convenient. As you can see, the function type of trap doesn't require the extra layer of quoting, so looks a little smarter. Conveniently, the `trap' command on its own lists all traps in the form of the shell code you'd need to recreate them, and you can see which sort is which.

There are two cases where the difference sticks out. One is that the function type has some extra wiring to allow you both to trap a signal, and pretend to anyone watching that the shell didn't handle it. An example will show this:

  TRAPINT() {
    print "Signal caught, stopping anyway."
    return $(( 128 + $1 ))
  }

That second line may look as rococo as the Amalienburg, but it's meaning is this: $1, the first argument to the function, is set to the number of the signal. In this case it will be 2 because that's the standard number for SIGINT. That means the arithmetic substitution $((...)) returns 130, the command `return 130' is executed, and the function returns with status 130. Returning with non-zero status is special in function traps: it tells the shell you want to abort the surrounding command even though the trap was handled, and that you want the status associated with that to be 130. It so happens that this is how UNIX handles returns from normal traps. Without setting a trap, do

  % cat
  ^C
  % print $?

and you'll see that this, too, has given the status 130, 128 plus the value of SIGINT. So if you do have the trap set, you'll see the message, but the command will abort --- even if it was running inside the shell.

Try

  % read line
  ^C

to see that happening. If you look at the status in $? you'll find it's actually 1, not 130; that's because the read command, when it aborted, overrode the return value from the trap. But it does that with an untrapped ^C, too, so that's not really an exception to what I've just said.

If you've been paying attention, you'll realise that traps set with the trap builtin can't do it in quite this way, because the function they return from would be whatever function you were in. You can see that:

  trap 'echo Returning...; return;' INT
  fn() {
    print In fn...
    read param
    print Leaving fn..
  }

If you run fn and hit ^C, the signal is trapped and the message printed, but because of the return, the shell quits fn immediately and you don't see the final message. If you missed out the `return;' (try it), the shell would carry on with the rest of fn after you typed something to read. Of course you can use this mechanism to leave functions after trapping a signal; it just so happens that in this case the mechanism with TRAPINT is a little closer to what untrapped signals do and hence a little neater.

One final flourish of late Baroque splendour: the trap for SIGEXIT, the one called when a function (or the shell itself, in fact) exits is a bit special because in the case of exiting a function it will be called in the environment of the calling function. So if you need to do something like set a local variable for an enclosing function you can have

  trap 'typeset param_in_enclosing_func=value' EXIT

do it for you; you couldn't do that with TRAPEXIT because the code would have its own function, so that even though it would be called after the first function exited, it wouldn't run directly in the enclosing one but in a separate TRAPEXIT function. You can even set an EXIT trap for the enclosing function by defining a nested `trap .. EXIT' inside that trap itself.

I lied, because there is one more special thing about TRAPEXIT: it's always reset after you exit a function and the trap itself has been called. Most traps just hang around until you explicitly unset them. There is an option, LOCAL_TRAPS, which makes traps set inside functions as well insulated as possible from those outside, or inside deeper functions. In other words, the old trap is saved and then restored when you exit the function; the scoping works pretty much like that for typeset, and in the same way traps for the enclosing scope, apart from any for EXIT, remain in effect inside a function unless you explicitly override them; and, again in the same way, if you unset it inside the function it will still be restored on exit.

LOCAL_TRAPS is the fixed behaviour of some other shells. In zsh, without the option set:

  trap 'echo Hi.' INT
  fn() {
     trap 'echo Bye.' INT
  }

Calling fn simply replaces the trap defined outside the function with the one defined inside while:

  trap 'echo Hi.' INT
  fn() {
     setopt localtraps
     trap 'echo Bye.' INT
  }

puts the original `Hi' trap back after the function exits.

I haven't told you how to unset a trap for good: the answer is

 trap - INT

As you would guess, you can use unfunction with function-type traps; that will correctly remove the trap as well as deleting the function. However, `trap -' works with both, so that's the recommended way.

Limits on processes

One other way that jobs started by the shell can be controlled is by using limits. These are actually limits set by the operating system, but the shell gives you a way of controlling them: the limit and unlimit commands. Type `limit' on its own to see a summary. I get:

  cputime         unlimited
  filesize        unlimited
  datasize        unlimited
  stacksize       8MB
  coredumpsize    0kB
  memoryuse       unlimited
  maxproc         2048
  descriptors     1024
  memorylocked    unlimited
  addressspace    unlimited

where the item on the left of each line is what is being limited, and on the right is the value. The manual page to look at, at least on Linux is for the function getrusage; that's the function the shell is calling when you run limit or unlimit.

In this case, the items are:

cputime
the total CPU time used by a process
filesize
maximum size of a file
datasize
the maximum size of data in use by a programme
stacksize
the maximum size of the stack, which is the area of memory used to store information during function calls
coredumpsize
the maximum size of a core file, which is an image of memory left by a programme that crashes, allowing you to debug it with gdb, dbx, ddd or some other debugger
memoryuse
the maximum main memory, i.e. programme memory which is in active use and hasn't been `swapped out' to disk
maxproc
the maximum number of simultaneous processes
descriptors
the maximum number of simultaneously open files (`descriptors' are the internal mechanism for referring to an open file on UNIX-like systems)
memorylocked
the maximum amount of memory locked in (I don't know what that is, either)
addressspace
the total amount of virtual memory, i.e. any memory whether it is main memory, or refers to somewhere on a disk, or indeed anything else.

You may well see other names; the shell decides when it is compiled what limits are supported by the system.

Of those, the one I use most commonly is coredumpsize: sometimes when I'm debugging a file I want a crashed programme to produce a `core' files so I can run gdb or dbx on it (`unlimit coredumpsize'), while other times they are just untidy (`limit coredumpsize 0'). Probably you would only alter any of the others if you knew there was a problem, for example a number-crunching programme used so much memory that the rest of the system was badly affected and you wanted to limit datasize to 64 megabyte or whatever. You could write this as:

  limit datasize 64m

There is a distinction made between `hard' and `soft' limits. Both have the same effect on programmes, but you can remove or reduce `soft' limits, while only the superuser (the system administrator's login, root) can do that to `hard' limits. Usually, therefore, limit and unlimit manipulate soft limits; to show or set hard limits, give the option -h. If I do `limit -h', I get the same list of limits as above, but with stacksize and coredumpsize unlimited --- that means I can reduce or remove the limits on those if I want, they're just set for my own convenience.

Why is stacksize set in this way? As I said, it refers to the memory in which the functions in programmes store variables and any other local information. If one function calls another, it uses more memory. You can get into a situation where functions call themselves recursively and there is no way out until the machine runs out of memory; limiting stacksize prevents this. You can actually see this with zsh itself (probably better not to try this if you'd rather the shell you're running didn't crash):

  % fn() { fn; }
  % fn

defines a function which keeps calling itself. To do this, all the functions inside zsh are calling themselves as well, using more and more stack memory. Actually, zsh uses other forms of memory inside each function and my version of zsh crashes due to exhaustion of that memory instead. However, it depends on the system how this works out.

Times

One way of returning information on process resources is with the `times' command. It simply shows the total CPU time used by the shell and by the programmes called for it --- in that order, and without description, so you need to remember. On each line, the first number is the time spent in user space and the second is the time spent in system space. If you're not concerned about the details of programmes the difference is pretty irrelevant, but if you are, then the difference is very roughly that between the time spent in the code you actually see before you compile a programme, and the time spent in `hidden' code where the system is doing something for you. It's not such an obvious distinction, because many library routines, such as mathematical functions, are run in user mode as no privileged access to internal bits of the system is required. Typically, system time is concerned with the details of input and output --- though even there it's not so simple, because the C output routines printf, puts, fread and others have user mode code which then calls the system routines read, write and so on.

You can measure the time taken by a particular external command by putting `time', in the singular this time, in front of it; this is essentially another precommand modifier, and is a shell reserved word rather than a builtin. This gives fairly obvious information. You can specify the information using the $TIMEFMT parameter, which has its own percent escapes, different from the ones used in prompts. It exists partly because the shell allowed you to access all sorts of other information about the process which ran, such as `page faults' --- occasions when the system had to fetch a part of the programme or data from disk because it wasn't in the main memory. However, that disappeared because it was too much work to convert the feature to configure itself automatically for different operating systems. It may be time to resurrect it.

You can also force the time to be shown automatically by setting the parameter $REPORTTIME; if a command runs for more than this many seconds, the $TIMEFMT output will be shown automatically.

3.2.9: Terminals, users, etc.

Watching for other users

Although this is more associated with parameters than builtins, the `log' command will tell you whether any of a group of people you want to watch out for have logged in or out. To use this, you set the $watch array parameter to a list of user names, or `all' for everyone, or `notme' for everyone except yourself. Even if you don't use log, any changes will be reported just before the shell prints a prompt. It will be printed using the $WATCHFMT parameter: once again, this takes its own set of percent escapes, listed in the zshparam manual.

ttyctl

There is a command ttyctl which is designed to keep badly behaved external commands from messing up the terminal settings. Most programmes are careful to restore any settings they change, but there are exceptions. After `ttyctl -f', the terminal is frozen; zsh will restore the settings, no matter what an external programme does with it. This includes deliberate attempts to change the terminal settings with the `stty' command, so the default is unfrozen, `ttyctl -u'.

3.2.10: Syntactic oddments

This section collects together a few builtins which, rather than controlling the behaviour of some feature of the shell, have some other special effect.

Controlling programme flow

The four functions here are exit, return, break, continue and source or .: they determine what the shell does next. You've met exit --- leave the shell altogether --- and return --- leave the current function. Be very careful not to confuse them. Calling exit in a shell function is usually bad:

  % fn() { exit; }
  % fn

This makes you entire shell session go away, not just the function. If you write C programmes, you should be very familiar with both, although there is one difference in this case: return at the top level in an interactive shell actually does nothing, rather than leaving the shell as you might expect. However, in a script, return outside a function does cause the entire script to stop. The reason for this is that zsh allows you to write autoloaded functions in the same form as scripts, so that they can be used as either; this wouldn't work if return did nothing when the file was run as a script. Other shells don't do this: return does nothing at the top level of a script, as well as interactively. However, other shells don't have the feature that function definition files can be run as scripts, either.

The next two commands, break and continue, are to do with constructs like `if'-blocks and loops, and it will be much easier if I introduce them when I talk about those below. They will also already be familiar to C programmers. (If you are a FORTRAN programmer, however, continue is not the statement you are familiar with; it is instead equivalent to CYCLE in FORTRAN90.)

The final pair of commands are . and source. They are similar to one another and cause another file to be read as a stream of commands in the current shell --- not as a script, for which a new shell would be started which would finish at the end of the script. The two are intended for running a series of commands which have some effect on the current shell, exactly like the startup files. Indeed, it's a very common use to have a call to one or other in a startup file; I have in my ~/.zshrc

  [[ -f ~/.aliasrc ]] && . ~/.aliasrc

which tests if the file ~/.aliasrc exists, and if so runs the commands in it; they are treated exactly as if they had appeared directly at that point in .zshrc.

Note that your $path is used to find the file to read from; this is a little surprising if you think of this as like a script being run, since zsh doesn't search for a script, it uses the name exactly as you gave it. In particular, if you don't have `.' in your $path and you use the form `.' rather than `source' you will need to say explicitly when you want to source a file in the current directory:

  . ./file

otherwise it won't be found.

It's a little bit like running a function, with the file as the function body. Indeed, the shell will set the positional parameters $* in just the same way. However, there's a crucial difference: there is no local parameter scope. Any variables in a sourced file, as in one of the startup files, are in the same scope as the point from which it was started. You can, therefore, source a file from inside a function and have the parameters in the sourced file local, but normally the only way of having parameters only for use in a sourced file is to unset them when you are finished.

The fact that both . and source exist is historical: the former comes from the Bourne shell, and the latter from the C shell, which seems deliberately to have done everything differently. The point noted above, that source always searches the current directory (and searches it first), is the only difference.

Re-evaluating an expression

Sometimes it's very useful to take a string and run it as if it were a set of shell commands. This is what eval does. More precisely, it sticks the arguments together with spaces and calls them. In the case of something like

  eval print Hello.

this isn't very useful; that's no different from a simple

  print Hello.

The difference comes when what's on the command line has something to be expanded, like a parameter:

  param='print Hello.'
  eval $param

Here, the $param is expanded just as it would be for a normal command. Then eval gets the string `print Hello.' and executes it as shell command line. Everything --- really everything --- that the shell would normally do to execute a command line is done again; in effect, it's run as a little function, except that no local context for parameters is created. If this sounds familiar, that's because it's exactly the way traps defined in the form

  trap 'print Hello.' EXIT

are called. This is one simple way out of the hole you can sometimes get yourself into when you have a parameter which contains the name of another parameter, instead of some data, and you want to get your hands on the data:

  # somewhere above...
  origdata='I am data.'
  # but all you know about is
  paramname=origdata
  # so to extract the data you can do...
  eval data=\$$paramname

Now $data contains the value you want. Make sure you understand the series of expansions going on: this sort of thing can get very confusing. First the command line is expanded just as normal. This turns the argument to eval into `data=$origdata'. The `$' that's still there was quoted by a backslash; the backslash was stripped and the `$' left; the $paramname was evaluated completely separately --- quoted characters like the \$ don't have any effect on expansions --- to give origdata. Eval calls the new line `data=$origdata' as a command in its own right, with the now obvious effect. If you're even slightly confused, the best thing to do is simply to quote everything you don't want to be immediately expanded:

  eval 'data=$'$paramname

or even

  eval 'data=${'$paramname'}'

may perhaps make your intentions more obvious.

It's possible when you're starting out to confuse `eval' with the `...` and $(...) commands, which also take the command in the middle `...' and evaluate it as a command line. However, these two (they're identical except for the syntax) then insert the output of that command back into the command line, while eval does no such thing; it has no effect at all on where input and output go. Conversely, the two forms of command substitution don't do an extra level of expansion. Compare:

  % foo='print bar'
  % eval $foo
  bar

with

  % foo='print bar'
  % echo $($foo)
  zsh: command not found: print bar

The $(...) substitution took $foo as the command line. As you are now painfully aware, zsh doesn't split scalar parameters, so this was turned into the single word `print bar', which isn't a command. The blank line is `echo' printing the empty result of the failed substitution.

3.2.11: More precommand modifiers: `exec`, `noglob`

Sometimes you want to run a command instead of the shell. This sometimes happens when you write a shell script to process the arguments to an external command, or set parameters for it, then call that command. For example:

  export MOZILLA_HOME=/usr/local/netscape
  netscape "$@"

Run as a script, this sets an environment variable, then starts netscape. However, as always the shell waits for the command to finish. That's rather wasteful here, since there's nothing more for the shell to do; you'd rather it simply magically turned into the netscape command. You can actually do this:

  export MOZILLA_HOME=/usr/local/netscape
  exec netscape "$@"

`exec' tells the shell that it doesn't need to wait; it can just make the command to run replace the shell. So this only uses a single process.

Normally, you should be careful not to use exec interactively, since normally you don't want the shell to go away. One legitimate use is to replace the current zsh with a brand new one if (say) you've set a whole load of options you don't like and want to restore the ones you usually have on startup:

  exec zsh

Or you may have the bad taste to start a completely different shell altogether. Conversely, a good piece of news about exec is that it is common to all shells, so you can use it from another shell to start zsh in the way I've just shown.

Like `command' and `builtin', `exec' is a `precommand modifier' in that it alters the way a command line is interpreted. Here's one more:

  noglob print *

If you've remembered what `glob' means, this is all fairly obvious. It instructs the shell not to turn the `*' into a list of all the files in the directory, but instead to let well alone. You can do this by quoting the `*', of course; often noglob is used as part of an alias to set up commands where you never need filename generation and don't want to have to bother quoting everything. However, note that noglob has no effect on any other type of expansion: parameter expansion and backquote (`....`) expansion, for example, happen as normal; the only thing that doesn't is turning patterns into a list of matching files. So it doesn't take away the necessity of knowing the rules of shell expansion. If you need that, the best thing to do is to use read or vared (see below) to read a line into a parameter, which you pass to your function:

  read -r param
  print $param

The -r makes sure $param is the unadulterated input.

3.2.12: Testing things

I told you in the last chapter that the right way to write tests in zsh was using the `[[ ... ]]' form, and why. So you can ignore the two builtins `test' and `[', even though they're the ones that resemble the Bourne shell. You can safely write

  if [[ $foo = '' ]]; then
    print The parameter foo is empty.  O, misery me.
  fi

  if [[ -z $foo ]]; then
    print Alack and alas, foo still has nothing in it.
  fi

instead of monstrosities like

  if test x$foo != x; then
    echo The emptiness of foo.  Yet are we not all empty\?
  fi

because even if $foo does expand to an empty string, which is what is implied if the tests are true, `[[ ... ]]' remembers there was something there and gets the syntax right. Rather than a builtin, this is actually a reserved word --- in fact it has to be, to be syntactically special --- but you probably aren't too bothered about the difference.

There are two sorts of tests, both shown above: those with three arguments, and those with two. The three-argument forms all have some comparison in the middle; in addition to `=' (or `==', which means the same here, and which according to the manual page we should be using, though none of us does), there are `!=' (not equal), `<', `>', `<=' and `>='. All these do string comparisons, i.e. they compare the sort order of the strings.

Since there are better ways of sorting things in zsh, the `=' and `!=' forms are by far the most common. Actually, they do something a bit more than string comparison: the expression on the right can be a pattern. The patterns understood are just the same as for matching filenames, except that `/' isn't special, so it can be matched by a `*'. Note that, because `=' and `!=' are treated specially by the shell, you shouldn't quote the patterns: you might think that unless you do, they'll be turned into file names, but they won't. So

  if [[ biryani = b* ]]; then
    print Word begins with a b.
  fi

works. If you'd written 'b*', including the quotes, it wouldn't have been treated as a pattern; it would have tested for a string which was exactly the two letters `b*' and nothing else. Pattern matching like this can be very powerful. If you've done any Bourne shell programming, you may remember the only way to use patterns there was via the `case' construction: that's still in zsh (see below), and uses the same sort of patterns, but the test form shown above is often more useful.

Then there are other three-argument tests which do numeric comparison. Rather oddly, these use letters rather than mathematical symbols: `-eq', `-lt' and `-le' compare if two numbers are equal, less than, or less than or equal, to one another. You can guess what `-gt' and `-ge' do. Note this is the other way round to Perl, which much more logically uses `==' to test for equality of numbers (not `=', since that's always an assignment operator in Perl) and `eq' (minus the minus) to test for equality of strings. Unfortunately we're now stuck with it this way round. If you are only comparing numbers, it's better to use the `(( ... ))' expression, because that has a proper understanding of arithmetic. However,

  if [[ $number -gt 3 ]]; then
    print Wow, that\'s big
  fi

and

  if (( $number > 3 )); then
    print Wow, that\'s STILL big
  fi

are essentially equivalent. In the second case, the status is zero (true) if the number in the expression was non-zero (sorry if I'm confusing you again) and vice versa. This means that

  if (( 3 )); then
    print It seems that 3 is non-zero, Watson.
  fi

is a perfectly valid test. As in C, the test operators in arithmetic return 1 for true and 0 for false, i.e. `$number > 3' is 1 if $number is greater than 3 and 0 otherwise; the inversion to shell logic, zero for true, only occurs at the final step when the expression has been completely evaluated and the `(( ... ))' command returns. At least with `[[ ... ]]' you don't need to worry about the extra negation; you can simply think in logical terms (although that's hard enough for a lot of people).

Finally, there are a few other odd comparisons in the three-argument form:

  if [[ file1 -nt file2 ]]; then
    print file1 is newer than file2
  fi

does the test implied by the example; there is also `-ot' to test for an older file, and there is also the little-used `-ef' which tests for an `equivalent file', meaning that they refer to the same file --- in other words, are linked; this can be a hard or a symbolic link, and in the second case it doesn't matter which of the two is the symbolic link. (If you were paying attention above, you'll know it can't possibly matter in the first case.)

In addition to these tests, which are pretty recognisable from most programming languages --- although you'll just have to remember that the `=' family compares strings and not numbers --- there are another set which are largely peculiar to UNIXy scripting languages. These are all in the form of a hyphen followed by a letter as the test, which always takes a single argument. I showed one: `-z $var' tests whether `$var' has zero length. It's opposite is `-n $var' which tests for non-zero length. Perhaps this is as good a time as any to point out that the arguments to these commands can be any single word expression, not just variables or filenames. You are quite at liberty to test

  if [[ -z "$var is sqrt(`print bibble`)" ]]; then
    print Flying pig detected.
  fi

if you like. In fact, the tests are so eager to make sure that they only have a one word argument that they will treat things like arrays, which usually return a whole set of words, as if they were in double quotes, joining the bits with spaces:

  array=(two words)
  if [[ $array = 'two words' ]]; then
    print "The array \$array is OK.  O, joy."
  fi

Apart from `-z' and `-n', most of the two-argument tests are to do with files: `-e' tests that the file named next exists, whatever type of file it is (it might be a directory or something weirder); `-f' tests if it exists and is a regular file (so it isn't a directory or anything weird this time); `-x' tests whether you can execute it. There are all sorts of others which are listed in the manual page for various properties of files. Then there are a couple of others: ``-o

Zsh User's Guide