Convert howto pages to Markdown

This commit is contained in:
flokoe 2023-07-05 11:10:03 +02:00
parent 024e1bcc0e
commit 2d52763c75
10 changed files with 2712 additions and 0 deletions

281
docs/howto/calculate-dc.md Normal file
View File

@ -0,0 +1,281 @@
# Calculating with dc
![](keywords>bash shell scripting arithmetic calculate)
## Introduction
dc(1) is a non standard, but commonly found, reverse-polish Desk
Calculator. According to Ken Thompson, \"dc is the oldest language on
Unix; it was written on the PDP-7 and ported to the PDP-11 before Unix
\[itself\] was ported\".
Historically the standard bc(1) has been implemented as a *front-end to
dc*.
## Simple calculation
In brief, the *reverse polish notation* means the numbers are put on the
stack first, then an operation is applied to them. Instead of writing
`1+1`, you write `1 1+`.
By default `dc`, unlike `bc`, doesn\'t print anything, the result is
pushed on the stack. You have to use the \"p\" command to print the
element at the top of the stack. Thus a simple operation looks like:
$ dc <<< '1 1+pq'
2
I used a \"here string\" present in bash 3.x, ksh93 and zsh. if your
shell doesn\'t support this, you can use `echo '1 1+p' | dc` or if you
have GNU `dc`, you can use `dc -e '1 1 +p`\'.
Of course, you can also just run `dc` and enter the commands.
The classic operations are:
- addition: `+`
- subtraction: `-`
- division: `/`
- multiplication: `*`
- remainder (modulo): `%`
- exponentiation: `^`
- square root: `v`
GNU `dc` adds a couple more.
To input a negative number you need to use the `_` (underscore)
character:
$ dc <<< '1_1-p'
2
You can use the *digits* `0` to `9` and the *letters* `A` to `F` as
numbers, and a dot (`.`) as a decimal point. The `A` to `F` **must** be
capital letters in order not to be confused with the commands specified
with lower case characters. A number with a letter is considered
hexadecimal:
dc <<< 'Ap'
10
The **output** is converted to **base 10** by default
## Scale And Base
`dc` is a calulator with abitrary precision, by default this precision
is 0. thus `dc <<< "5 4/p"` prints \"1\".
We can increase the precision using the `k` command. It pops the value
at the top of the stack and uses it as the precision argument:
dc <<< '2k5 4/p' # prints 1.25
dc <<< '4k5 4/p' # prints 1.2500
dc <<< '100k 2vp'
1.4142135623730950488016887242096980785696718753769480731766797379907\
324784621070388503875343276415727
dc supports *large* precision arguments.
You can change the base used to output (*print*) the numbers with `o`
and the base used to input (*type*) the numbers with `i`:
dc << EOF
20 p# prints 20, output is in base 10
16o # the output is now in base 2 16
20p # prints 14, in hex
16i # the output is now in hex
p # prints 14 this doesn't modify the number in the stack
10p # prints 10 the output is done in base 16
EOF
Note: when the input value is modified, the base is modified for all
commands, including `i`:
dc << EOF
16i 16o # base is 16 for input and output
10p # prints 10
10i # ! set the base to 10 i.e. to 16 decimal
17p # prints 17
EOF
This code prints 17 while we might think that `10i` reverts the base
back to 10 and thus the number should be converted to hex and printed as
11. The problem is 10 was typed while the input base 16, thus the base
was set to 10 hexadecimal, i.e. 16 decimal.
dc << EOF
16o16o10p #prints 10
Ai # set the base to A in hex i.e. 10
17p # prints 11 in base 16
EOF
## Stack
There are two basic commands to manipulate the stack:
- `d` duplicates the top of the stack
- `c` clears the stack
```{=html}
<!-- -->
```
$ dc << EOF
2 # put 2 on the stack
d # duplicate i.e. put another 2 on the stack
*p # multiply and print
c p # clear and print
EOF
4
dc: stack empty
`c p` results in an error, as we would expect, as c removes everything
on the stack. *Note: we can use `#` to put comments in the script.*
If you are lost, you can inspect (i.e. print) the stack using the
command `f`. The stack remains unchanged:
dc <<< '1 2 d 4+f'
6
2
1
Note how the first element that will be popped from the stack is printed
first, if you are used to an HP calculator, it\'s the reverse.
Don\'t hesitate to put `f` in the examples of this tutorial, it doesn\'t
change the result, and it\'s a good way to see what\'s going on.
## Registers
The GNU `dc` manual says that dc has at least **256 registers**
depending on the range of unsigned char. I\'m not sure how you are
supposed to use the NUL byte. Using a register is easy:
dc <<EOF
12 # put 12 on the stack
sa # remove it from the stack (s), and put it in register 'a'
10 # put 10 on the stack
la # read (l) the value of register 'a' and push it on the stack
+p # add the 2 values and print
EOF
The above snippet uses newlines to embed comments, but it doesn\'t
really matter, you can use `echo '12sa10la+p'| dc`, with the same
results.
The register can contain more than just a value, **each register is a
stack on its own**.
dc <<EOF
12sa #store 12 in 'a'
6Sa # with a capital S the 6 is removed
# from the main stack and pushed on the 'a' stack
lap # prints 6, the value at the top of the 'a' stack
lap # still prints 6
Lap # prints 6 also but with a capital L, it pushes the value in 'a'
# to the main stack and pulls it from the 'a' stack
lap # prints 12, which is now at the top of the stack
EOF
## Macros
`dc` lets you push arbitrary strings on the stack when the strings are
enclosed in `[]`. You can print it with `p`: `dc <<< '[Hello World!]p'`
and you can evalute it with x: `dc <<< '[1 2+]xp'`.
This is not that interesting until combined with registers. First,
let\'s say we want to calculate the square of a number (don\'t forget to
include `f` if you get lost!):
dc << EOF
3 # push our number on the stack
d # duplicate it i.e. push 3 on the stack again
d**p # duplicate again and calculate the product and print
EOF
Now we have several cubes to calculate, we could use `dd**` several
times, or use a macro.
dc << EOF
[dd**] # push a string
sa # save it in register a
3 # push 3 on the stack
lax # push the string "dd**" on the stack and execute it
p # print the result
4laxp # same operation for 4, in one line
EOF
## Conditionals and Loops
`dc` can execute a macro stored in a register using the `lR x` combo,
but it can also execute macros conditionally. `>a` will execute the
macro stored in the register `a`, if the top of the stack is *greater
than* the second element of the stack. Note: the top of the stack
contains the last entry. When written, it appears as the reverse of what
we are used to reading:
dc << EOF
[[Hello World]p] sR # store in 'R' a macro that prints Hello World
2 1 >R # do nothing 1 is at the top 2 is the second element
1 2 >R # prints Hello World
EOF
Some `dc` have `>R <R =R`, GNU `dc` had some more, check your manual.
Note that the test \"consumes\" its operands: the 2 first elements are
popped off the stack (you can verify that
`dc <<< "[f]sR 2 1 >R 1 2 >R f"` doesn\'t print anything)
Have you noticed how we can *include* a macro (string) in a macro? and
as `dc` relies on a stack we can, in fact, use the macro recursively
(have your favorite control-c key combo ready ;)) :
dc << EOF
[ [Hello World] p # our macro starts by printing Hello World
lRx ] # and then executes the macro in R
sR # we store it in the register R
lRx # and finally executes it.
EOF
We have recursivity, we have test, we have loops:
dc << EOF
[ li # put our index i on the stack
p # print it, to see what's going on
1 - # we decrement the index by one
si # store decremented index (i=i-1)
0 li >L # if i > 0 then execute L
] sL # store our macro with the name L
10 si # let's give to our index the value 10
lLx # and start our loop
EOF
Of course code written this way is far too easy to read! Make sure to
remove all those extra spaces newlines and comments:
dc <<< '[lip1-si0li>L]sL10silLx'
dc <<< '[p1-d0<L]sL10lLx' # use the stack instead of a register
I\'ll let you figure out the second example, it\'s not hard, it uses the
stack instead of a register for the index.
## Next
Check your dc manual, i haven\'t decribed everything, like arrays (only
documented with \"; : are used by bc(1) for array operations\" on
solaris, probably because *echo \'1 0:a 0Sa 2 0:a La 0;ap\' \| dc*
results in //Segmentation Fault (core dump) //, the latest solaris uses
GNU dc)
You can find more info and dc programs here:
- <http://en.wikipedia.org/wiki/Dc_(Unix)>
And more example, as well as a dc implementation in python here:
- <http://en.literateprograms.org/Category:Programming_language:dc>
- <http://en.literateprograms.org/Desk_calculator_%28Python%29>
The manual for the 1971 dc from Bell Labs:
- <http://cm.bell-labs.com/cm/cs/who/dmr/man12.ps> (dead link)

View File

@ -0,0 +1,98 @@
# Collapsing Functions
![](keywords>bash shell scripting example function collapse)
## What is a \"Collapsing Function\"?
A collapsing function is a function whose behavior changes depending
upon the circumstances under which it\'s run. Function collapsing is
useful when you find yourself repeatedly checking a variable whose value
never changes.
## How do I make a function collapse?
Function collapsing requires some static feature in the environment. A
common example is a script that gives the user the option of having
\"verbose\" output.
#!/bin/bash
[[ $1 = -v || $1 = --verbose ]] && verbose=1
chatter() {
if [[ $verbose ]]; then
chatter() {
echo "$@"
}
chatter "$@"
else
chatter() {
:
}
fi
}
echo "Waiting for 10 seconds."
for i in {1..10}; do
chatter "$i"
sleep 1
done
## How does it work?
The first time you run chatter(), the function redefines itself based on
the value of verbose. Thereafter, chatter doesn\'t check \$verbose, it
simply is. Further calls to the function reflect its collapsed nature.
If verbose is unset, chatter will echo nothing, with no extra effort
from the developer.
## More examples
FIXME Add more examples!
# Somewhat more portable find -executable
# FIXME/UNTESTED (I don't have access to all of the different versions of find.)
# Usage: find PATH ARGS -- use find like normal, except use -executable instead of
# various versions of -perm /+ blah blah and hacks
find() {
hash find || { echo 'find not found!'; exit 1; }
# We can be pretty sure "$0" should be executable.
if [[ $(command find "$0" -executable 2> /dev/null) ]]; then
unset -f find # We can just use the command find
elif [[ $(command find "$0" -perm /u+x 2> /dev/null) ]]; then
find() {
typeset arg args
for arg do
[[ $arg = -executable ]] && args+=(-perm /u+x) || args+=("$arg")
done
command find "${args[@]}"
}
elif [[ $(command find "$0" -perm +u+x 2> /dev/null) ]]; then
find() {
typeset arg args
for arg do
[[ $arg = -executable ]] && args+=(-perm +u+x) || args+=("$arg")
done
command find "${args[@]}"
}
else # Last resort
find() {
typeset arg args
for arg do
[[ $arg = -executable ]] && args+=(-exec test -x {} \; -print) || args+=("$arg")
done
command find "${args[@]}"
}
fi
find "$@"
}
#!/bin/bash
# Using collapsing functions to turn debug messages on/off
[ "--debug" = "$1" ] && dbg=echo || dbg=:
# From now on if you use $dbg instead of echo, you can select if messages will be shown
$dbg "This message will only be displayed if --debug is specified at the command line

109
docs/howto/conffile.md Normal file
View File

@ -0,0 +1,109 @@
# Config files for your script
![](keywords>bash shell scripting config files include configuration)
## General
For this task, you don\'t have to write large parser routines (unless
you want it 100% secure or you want a special file syntax) - you can use
the Bash source command. The file to be sourced should be formated in
key=\"value\" format, otherwise bash will try to interpret commands:
#!/bin/bash
echo "Reading config...." >&2
source /etc/cool.cfg
echo "Config for the username: $cool_username" >&2
echo "Config for the target host: $cool_host" >&2
So, where do these variables come from? If everything works fine, they
are defined in /etc/cool.cfg which is a file that\'s sourced into the
current script or shell. Note: this is **not** the same as executing
this file as a script! The sourced file most likely contains something
like:
cool_username="guest"
cool_host="foo.example.com"
These are normal statements understood by Bash, nothing special. Of
course (and, a big disadvantage under normal circumstances) the sourced
file can contain **everything** that Bash understands, including
malicious code!
The `source` command also is available under the name `.` (dot). The
usage of the dot is identical:
#!/bin/bash
echo "Reading config...." >&2
. /etc/cool.cfg #note the space between the dot and the leading slash of /etc.cfg
echo "Config for the username: $cool_username" >&2
echo "Config for the target host: $cool_host" >&2
## Per-user configs
There\'s also a way to provide a system-wide config file in /etc and a
custom config in \~/(user\'s home) to override system-wide defaults. In
the following example, the if/then construct is used to check for the
existance of a user-specific config:
#!/bin/bash
echo "Reading system-wide config...." >&2
. /etc/cool.cfg
if [ -r ~/.coolrc ]; then
echo "Reading user config...." >&2
. ~/.coolrc
fi
## Secure it
As mentioned earlier, the sourced file can contain anything a Bash
script can. Essentially, it **is** an included Bash script. That creates
security issues. A malicicios person can \"execute\" arbitrary code when
your script is sourcing its config file. You might want to allow only
constructs in the form `NAME=VALUE` in that file (variable assignment
syntax) and maybe comments (though technically, comments are
unimportant). Imagine the following \"config file\", containing some
malicious code:
# cool config file for my even cooler script
username=god_only_knows
hostname=www.example.com
password=secret ; echo rm -rf ~/*
parameter=foobar && echo "You've bene pwned!";
# hey look, weird code follows...
echo "I am the skull virus..."
echo rm -fr ~/*
mailto=netadmin@example.com
You don\'t want these `echo`-commands (which could be any other
commands!) to be executed. One way to be a bit safer is to filter only
the constructs you want, write the filtered results to a new file and
source the new file. We also need to be sure something nefarious hasn\'t
been added to the end of one of our name=value parameters, perhaps using
; or && command separators. In those cases, perhaps it is simplest to
just ignore the line entirely. Egrep (`grep -E`) will help us here, it
filters by description:
#!/bin/bash
configfile='/etc/cool.cfg'
configfile_secured='/tmp/cool.cfg'
# check if the file contains something we don't want
if egrep -q -v '^#|^[^ ]*=[^;]*' "$configfile"; then
echo "Config file is unclean, cleaning it..." >&2
# filter the original to a new file
egrep '^#|^[^ ]*=[^;&]*' "$configfile" > "$configfile_secured"
configfile="$configfile_secured"
fi
# now source it, either the original or the filtered variant
source "$configfile"
**[To make clear what it does:]{.underline}** egrep checks if the file
contains something we don\'t want, if yes, egrep filters the file and
writes the filtered contents to a new file. If done, the original file
name is changed to the name stored in the variable `configfile`. The
file named by that variable is sourced, as if it were the original file.
This filter allows only `NAME=VALUE` and comments in the file, but it
doesn\'t prevent all methods of code execution. I will address that
later.

View File

@ -0,0 +1,125 @@
# Dissect a bad oneliner
``` bash
$ ls *.zip | while read i; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done
```
This is an actual one-liner someone asked about in `#bash`. **There are
several things wrong with it. Let\'s break it down!**
``` bash
$ ls *.zip | while read i; do ...; done
```
(Please read <http://mywiki.wooledge.org/ParsingLs>.) This command
executes `ls` on the expansion of `*.zip`. Assuming there are filenames
in the current directory that end in \'.zip\', ls will give a
human-readable list of those names. The output of ls is not for parsing.
But in sh and bash alike, we can loop safely over the glob itself:
``` bash
$ for i in *.zip; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done
```
Let\'s break it down some more!
``` bash
j=`echo $i | sed 's/.zip//g'` # where $i is some name ending in '.zip'
```
The goal here seems to be get the filename without its `.zip` extension.
In fact, there is a POSIX(r)-compliant command to do this: `basename`
The implementation here is suboptimal in several ways, but the only
thing that\'s genuinely error-prone with this is \"`echo $i`\". Echoing
an *unquoted* variable means
[wordsplitting](/syntax/expansion/wordsplit) will take place, so any
whitespace in `$i` will essentially be normalized. In `sh` it is
necessary to use an external command and a subshell to achieve the goal,
but we can eliminate the pipe (subshells, external commands, and pipes
carry extra overhead when they launch, so they can really hurt
performance in a loop). Just for good measure, let\'s use the more
readable, [modern](/syntax/expansion/cmdsubst) `$()` construct instead
of the old style backticks:
``` bash
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir $j; cd $j; unzip ../$i; cd ..; done
```
In Bash we don\'t need the subshell or the external basename command.
See [Substring removal with parameter
expansion](/syntax/pe#substring_removal):
``` bash
bash $ for i in *.zip; do j="${i%.zip}"; mkdir $j; cd $j; unzip ../$i; cd ..; done
```
Let\'s keep going:
``` bash
$ mkdir $j; cd $j; ...; cd ..
```
As a programmer, you **never** know the situation under which your
program will run. Even if you do, the following best practice will never
hurt: When a following command depends on the success of a previous
command(s), check for success! You can do this with the \"`&&`\"
conjunction, that way, if the previous command fails, bash will not try
to execute the following command(s). It\'s fully POSIX(r). Oh, and
remember what I said about [wordsplitting](/syntax/expansion/wordsplit)
in the previous step? Well, if you don\'t quote `$j`, wordsplitting can
happen again.
``` bash
$ mkdir "$j" && cd "$j" && ... && cd ..
```
That\'s almost right, but there\'s one problem \-- what happens if `$j`
contains a slash? Then `cd ..` will not return to the original
directory. That\'s wrong! `cd -` causes cd to return to the previous
working directory, so it\'s a much better choice:
``` bash
$ mkdir "$j" && cd "$j" && ... && cd -
```
(If it occurred to you that I forgot to check for success after cd -,
good job! You could do this with `{ cd - || break; }`, but I\'m going to
leave that out because it\'s verbose and I think it\'s likely that we
will be able to get back to our original working directory without a
problem.)
So now we have:
``` bash
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && cd "$j" && unzip ../$i && cd -; done
```
``` bash
bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && cd "$j" && unzip ../$i && cd -; done
```
Let\'s throw the `unzip` command back in the mix:
``` bash
mkdir "$j" && cd "$j" && unzip ../$i && cd -
```
Well, besides word splitting, there\'s nothing terribly wrong with this.
Still, did it occur to you that unzip might already be able to target a
directory? There isn\'t a standard for the `unzip` command, but all the
implementations I\'ve seen can do it with the -d flag. So we can drop
the cd commands entirely:
``` bash
$ mkdir "$j" && unzip -d "$j" "$i"
```
``` bash
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && unzip -d "$j" "$i"; done
```
``` bash
bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && unzip -d "$j" "$i"; done
```
There! That\'s as good as it gets.

389
docs/howto/edit-ed.md Normal file
View File

@ -0,0 +1,389 @@
# Editing files via scripts with ed
![](keywords>bash shell scripting arguments file editor edit ed sed)
## Why ed?
Like `sed`, `ed` is a line editor. However, if you try to change file
contents with `sed`, and the file is open elsewhere and read by some
process, you will find out that GNU `sed` and its `-i` option will not
allow you to edit the file. There are circumstances where you may need
that, e.g. editing active and open files, the lack of GNU, or other
`sed`, with \"in-place\" option available.
Why `ed`?
- maybe your `sed` doesn\'t support in-place edit
- maybe you need to be as portable as possible
- maybe you need to really edit in-file (and not create a new file
like GNU `sed`)
- last but not least: standard `ed` has very good editing and
addressing possibilities, compared to standard `sed`
Don\'t get me wrong, this is **not** meant as anti-`sed` article! It\'s
just meant to show you another way to do the job.
## Commanding ed
Since `ed` is an interactive text editor, it reads and executes commands
that come from `stdin`. There are several ways to feed our commands to
ed:
**[Pipelines]{.underline}**
echo '<ED-COMMANDS>' | ed <FILE>
To inject the needed newlines, etc. it may be easier to use the builtin
command, `printf` (\"help printf\"). Shown here as an example Bash
function to prefix text to file content:
# insertHead "$text" "$file"
insertHead() {
printf '%s\n' H 1i "$1" . w | ed -s "$2"
}
**[Here-strings]{.underline}**
ed <FILE> <<< '<ED-COMMANDS>'
**[Here-documents]{.underline}**
ed <FILE> <<EOF
<ED-COMMANDS>
EOF
Which one you prefer is your choice. I will use the here-strings, since
it looks best here IMHO.
There are other ways to provide input to `ed`. For example, process
substitution. But these should be enough for daily needs.
Since `ed` wants commands separated by newlines, I\'ll use a special
Bash quoting method, the C-like strings `$'TEXT'`, as it can interpret a
set of various escape sequences and special characters. I\'ll use the
`-s` option to make it less verbose.
## The basic interface
Check the `ed` manpage for details
Similar to `vi` or `vim`, `ed` has a \"command mode\" and an
\"interactive mode\". For non-interactive use, the command mode is the
usual choice.
Commands to `ed` have a simple and regular structure: zero, one, or two
addresses followed by a single-character command, possibly followed by
parameters to that command. These addresses specify one or more lines in
the text buffer. Every command that requires addresses has default
addresses, so the addresses can often be omitted.
The line addressing is relative to the *current line*. If the edit
buffer is not empty, the initial value for the *current line* shall be
the last line in the edit buffer, otherwise zero. Generally, the
*current line* is the last line affected by a command. All addresses can
only address single lines, not blocks of lines!
Line addresses or commands using *regular expressions* interpret POSIX
Basic Regular Expressions (BRE). A null BRE is used to reference the
most recently used BRE. Since `ed` addressing is only for single lines,
no RE can ever match a newline.
## Debugging your ed scripts
By default, `ed` is not very talkative and will simply print a \"?\"
when an error occurs. Interactively you can use the `h` command to get a
short message explaining the last error. You can also turn on a mode
that makes `ed` automatically print this message with the `H` command.
It is a good idea to always add this command at the beginning of your ed
scripts:
bash > ed -s file <<< $'H\n,df'
?
script, line 2: Invalid command suffix
While working on your script, you might make errors and destroy your
file, you might be tempted to try your script doing something like:
# Works, but there is better
# copy my original file
cp file file.test
# try my script on the file
ed -s file.test <<< $'H\n<ed commands>\nw'
# see the results
cat file.test
There is a much better way though, you can use the ed command `p` to
print the file, now your testing would look like:
ed -s file <<< $'H\n<ed commands>\n,p'
the `,` (comma) in front of the `p` command is a shortcut for `1,$`
which defines an address range for the first to the last line, `,p` thus
means print the whole file, after it has been modified. When your script
runs sucessfully, you only have to replace the `,p` by a `w`.
Of course, even if the file is not modified by the `p` command, **it\'s
always a good idea to have a backup copy!**
## Editing your files
Most of these things can be done with `sed`. But there are also things
that can\'t be done in `sed` or can only be done with very complex code.
### Simple word substitutions
Like `sed`, `ed` also knows the common `s/FROM/TO/` command, and it can
also take line-addresses. **If no substitution is made on the addressed
lines, it\'s considered an error.**
#### Substitutions through the whole file
ed -s test.txt <<< $',s/Windows(R)-compatible/POSIX-conform/g\nw'
[Note:]{.underline} The comma as single address operator is an alias for
`1,$` (\"all lines\").
#### Substitutions in specific lines
On a line containing `fruits`, do the substitution:
ed -s test.txt <<< $'/fruits/s/apple/banana/g\nw'
On the 5th line after the line containing `fruits`, do the substitution:
ed -s test.txt <<< $'/fruits/+5s/apple/banana/g\nw'
### Block operations
#### Delete a block of text
The simple one is a well-known (by position) block of text:
# delete lines number 2 to 4 (2, 3, 4)
ed -s test.txt <<< $'2,5d\nw'
This deletes all lines matching a specific regular expression:
# delete all lines matching foobar
ed -s test.txt <<< $'g/foobar/d\nw'
g/regexp/ applies the command following it to all the lines matching the
regexp
#### Move a block of text
\...using the `m` command: `<ADDRESS> m <TARGET-ADDRESS>`
This is definitely something that can\'t be done easily with sed.
# moving lines 5-9 to the end of the file
ed -s test.txt <<< $'5,9m$\nw'
# moving lines 5-9 to line 3
ed -s test.txt <<< $'5,9m3\nw'
#### Copy a block of text
\...using the `t` command: `<ADDRESS> t <TARGET-ADDRESS>`
You use the `t` command just like you use the `m` (move) command.
# make a copy of lines 5-9 and place it at the end of the file
ed -s test.txt <<< $'5,9t$\nw'
# make a copy of lines 5-9 and place it at line 3
ed -s test.txt <<< $'5,9t3\nw'
#### Join all lines
\...but leave the final newline intact. This is done by an extra
command: `j` (join).
ed -s file <<< $'1,$j\nw'
Compared with two other methods (using `tr` or `sed`), you don\'t have
to delete all newlines and manually add one at the end.
### File operations
#### Insert another file
How do you insert another file? As with `sed`, you use the `r` (read)
command. That inserts another file at the line before the last line (and
prints the result to stdout - `,p`):
ed -s FILE1 <<< $'$-1 r FILE2\n,p'
To compare, here\'s a possible `sed` solution which must use Bash
arithmetic and the external program `wc`:
sed "$(($(wc -l < FILE1)-1))r FILE2" FILE1
# UPDATE here's one which uses GNU sed's "e" parameter for the s-command
# it executes the commands found in pattern space. I'll take that as a
# security risk, but well, sometimes GNU > security, you know...
sed '${h;s/.*/cat FILE2/e;G}' FILE1
Another approach, in two invocations of sed, that avoids the use of
external commands completely:
sed $'${s/$/\\n-||-/;r FILE2\n}' FILE1 | sed '0,/-||-/{//!h;N;//D};$G'
## Pitfalls
### ed is not sed
ed and sed might look similar, but the same command(s) might act
differently:
**\_\_ /foo/d \_\_**
In sed /foo/d will delete all lines matching foo, in ed the commands are
not repeated on each line so this command will search the next line
matching foo and delete it. If you want to delete all lines matching
foo, or do a subsitution on all lines matching foo you have to tell ed
about it with the g (global) command:
echo $'1\n1\n3' > file
#replace all lines matching 1 by "replacement"
ed -s file <<< $'g/1/s/1/replacement/\n,p'
#replace the first line matching 1 by "replacement"
#(because it starts searching from the last line)
ed -s file <<< $'s/1/replacement/\n,p'
**\_\_ an error stops the script \_\_**
You might think that it\'s not a problem and that the same thing happens
with sed and you\'re right, with the exception that if ed does not find
a pattern it\'s an error, while sed just continues with the next line.
For instance, let\'s say that you want to change foo to bar on the first
line of the file and add something after the next line, ed will stop if
it cannot find foo on the first line, sed will continue.
#Gnu sed version
sed -e '1s/foo/bar/' -e '$a\something' file
#First ed version, does nothing if foo is not found on the first line:
ed -s file <<< $'H\n1s/foo/bar/\na\nsomething\n.\nw'
If you want the same behaviour you can use g/foo/ to trick ed. g/foo/
will apply the command on all lines matching foo, thus the substitution
will succeed and ed will not produce an error when foo is not found:
#Second version will add the line with "something" even if foo is not found
ed -s file <<< $'H\n1g/foo/s/foo/bar/\na\nsomething\n.\nw'
In fact, even a substitution that fails after a g/ / command does not
seem to cause an error, i.e. you can use a trick like g/./s/foo/bar/ to
attempt the substitution on all non blank lines
### here documents
**\_\_ shell parameters are expanded \_\_**
If you don\'t quote the delimiter, \$ has a special meaning. This sounds
obvious but it\'s easy to forget this fact when you use addresses like
\$-1 or commands like \$a. Either quote the \$ or the delimiter:
#fails
ed -s file << EOF
$a
last line
.
w
EOF
#ok
ed -s file << EOF
\$a
last line
.
w
EOF
#ok again
ed -s file << 'EOF'
$a
last line
.
w
EOF
**\_\_ \".\" is not a command \_\_**
The . used to terminate the command \"a\" must be the only thing on the
line. take care if you indent the commands:
#ed doesn't care about the spaces before the commands, but the . must be the only thing on the line:
ed -s file << EOF
a
my content
.
w
EOF
## Simulate other commands
Keep in mind that in all the examples below, the entire file will be
read into memory.
### A simple grep
ed -s file <<< 'g/foo/p'
# equivalent
ed -s file <<< 'g/foo/'
The name `grep` is derived from the notaion `g/RE/p` (global =\> regular
expression =\> print). ref
<http://www.catb.org/~esr/jargon/html/G/grep.html>
### wc -l
Since the default for the `ed` \"print line number\" command is the last
line, a simple `=` (equal sign) will print this line number and thus the
number of lines of the file:
ed -s file <<< '='
### cat
Yea, it\'s a joke\...
ed -s file <<< $',p'
\...but a similar thing to `cat` showing line-endings and escapes can be
done with the `list` command (l):
ed -s file <<< $',l'
FIXME to be continued
## Links
Reference:
- [Gnu ed](http://www.gnu.org/software/ed/manual/ed_manual.html) - if
we had to guess, you\'re probably using this one.
- POSIX
[ed](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38),
[ex](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ex.html#tag_20_40),
and
[vi](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/vi.html#tag_20_152)
- <http://sdf.lonestar.org/index.cgi?tutorials/ed> - ed cheatsheet on
sdf.org
Misc info / tutorials:
- [How can I replace a string with another string in a variable, a
stream, a file, or in all the files in a
directory?](http://mywiki.wooledge.org/BashFAQ/021) - BashFAQ
- <http://wolfram.schneider.org/bsd/7thEdManVol2/edtut/edtut.pdf> -
Old but still relevant ed tutorial.

View File

@ -0,0 +1,359 @@
# Small getopts tutorial
![](keywords>bash shell scripting arguments positional parameters options getopt getopts)
## Description
**Note that** `getopts` is neither able to parse GNU-style long options
(`--myoption`) nor XF86-style long options (`-myoption`). So, when you
want to parse command line arguments in a professional ;-) way,
`getopts` may or may not work for you. Unlike its older brother `getopt`
(note the missing *s*!), it\'s a shell builtin command. The advantages
are:
- No need to pass the positional parameters through to an external
program.
- Being a builtin, `getopts` can set shell variables to use for
parsing (impossible for an *external* process!)
- There\'s no need to argue with several `getopt` implementations
which had buggy concepts in the past (whitespace, \...)
- `getopts` is defined in POSIX(r).
------------------------------------------------------------------------
Some other methods to parse positional parameters - using neither
**getopt** nor **getopts** - are described in: [How to handle positional
parameters](/scripting/posparams).
### Terminology
It\'s useful to know what we\'re talking about here, so let\'s see\...
Consider the following command line:
mybackup -x -f /etc/mybackup.conf -r ./foo.txt ./bar.txt
These are all positional parameters, but they can be divided into
several logical groups:
- `-x` is an **option** (aka **flag** or **switch**). It consists of a
dash (`-`) followed by **one** character.
- `-f` is also an option, but this option has an associated **option
argument** (an argument to the option `-f`): `/etc/mybackup.conf`.
The option argument is usually the argument following the option
itself, but that isn\'t mandatory. Joining the option and option
argument into a single argument `-f/etc/mybackup.conf` is valid.
- `-r` depends on the configuration. In this example, `-r` doesn\'t
take arguments so it\'s a standalone option like `-x`.
- `./foo.txt` and `./bar.txt` are remaining arguments without any
associated options. These are often used as **mass-arguments**. For
example, the filenames specified for `cp(1)`, or arguments that
don\'t need an option to be recognized because of the intended
behavior of the program. POSIX(r) calls them **operands**.
To give you an idea about why `getopts` is useful, The above command
line is equivalent to:
mybackup -xrf /etc/mybackup.conf ./foo.txt ./bar.txt
which is complex to parse without the help of `getopts`.
The option flags can be **upper- and lowercase** characters, or
**digits**. It may recognize other characters, but that\'s not
recommended (usability and maybe problems with special characters).
### How it works
In general you need to call `getopts` several times. Each time it will
use the next positional parameter and a possible argument, if parsable,
and provide it to you. `getopts` will not change the set of positional
parameters. If you want to shift them, it must be done manually:
shift $((OPTIND-1))
# now do something with $@
Since `getopts` sets an exit status of *FALSE* when there\'s nothing
left to parse, it\'s easy to use in a while-loop:
while getopts ...; do
...
done
`getopts` will parse options and their possible arguments. It will stop
parsing on the first non-option argument (a string that doesn\'t begin
with a hyphen (`-`) that isn\'t an argument for any option in front of
it). It will also stop parsing when it sees the `--` (double-hyphen),
which means [end of options](/dict/terms/end_of_options).
### Used variables
variable description
------------------------------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[OPTIND](/syntax/shellvars#OPTIND) Holds the index to the next argument to be processed. This is how `getopts` \"remembers\" its own status between invocations. Also useful to shift the positional parameters after processing with `getopts`. `OPTIND` is initially set to 1, and **needs to be re-set to 1 if you want to parse anything again with getopts**
[OPTARG](/syntax/shellvars#OPTARG) This variable is set to any argument for an option found by `getopts`. It also contains the option flag of an unknown option.
[OPTERR](/syntax/shellvars#OPTERR) (Values 0 or 1) Indicates if Bash should display error messages generated by the `getopts` builtin. The value is initialized to **1** on every shell startup - so be sure to always set it to **0** if you don\'t want to see annoying messages! **`OPTERR` is not specified by POSIX for the `getopts` builtin utility \-\-- only for the C `getopt()` function in `unistd.h` (`opterr`).** `OPTERR` is bash-specific and not supported by shells such as ksh93, mksh, zsh, or dash.
`getopts` also uses these variables for error reporting (they\'re set to
value-combinations which arent possible in normal operation).
### Specify what you want
The base-syntax for `getopts` is:
getopts OPTSTRING VARNAME [ARGS...]
where:
`OPTSTRING` tells `getopts` which options to expect and where to expect arguments (see below)
------------- ------------------------------------------------------------------------------------
`VARNAME` tells `getopts` which shell-variable to use for option reporting
`ARGS` tells `getopts` to parse these optional words instead of the positional parameters
#### The option-string
The option-string tells `getopts` which options to expect and which of
them must have an argument. The syntax is very simple \-\-- every option
character is simply named as is, this example-string would tell
`getopts` to look for `-f`, `-A` and `-x`:
getopts fAx VARNAME
When you want `getopts` to expect an argument for an option, just place
a `:` (colon) after the proper option flag. If you want `-A` to expect
an argument (i.e. to become `-A SOMETHING`) just do:
getopts fA:x VARNAME
If the **very first character** of the option-string is a `:` (colon),
which would normally be nonsense because there\'s no option letter
preceding it, `getopts` switches to \"**silent error reporting mode**\".
In productive scripts, this is usually what you want because it allows
you to handle errors yourself without being disturbed by annoying
messages.
#### Custom arguments to parse
The `getopts` utility parses the [positional
parameters](/scripting/posparams) of the current shell or function by
default (which means it parses `"$@"`).
You can give your own set of arguments to the utility to parse. Whenever
additional arguments are given after the `VARNAME` parameter, `getopts`
doesn\'t try to parse the positional parameters, but these given words.
This way, you are able to parse any option set you like, here for
example from an array:
while getopts :f:h opt "${MY_OWN_SET[@]}"; do
...
done
A call to `getopts` **without** these additional arguments is
**equivalent** to explicitly calling it with `"$@"`:
getopts ... "$@"
### Error Reporting
Regarding error-reporting, there are two modes `getopts` can run in:
- verbose mode
- silent mode
For productive scripts I recommend to use the silent mode, since
everything looks more professional, when you don\'t see annoying
standard messages. Also it\'s easier to handle, since the failure cases
are indicated in an easier way.
#### Verbose Mode
invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is unset
----------------------------- ----------------------------------------------------------------------------------------------
required argument not found `VARNAME` is set to `?` (question-mark), `OPTARG` is unset and an *error message is printed*
#### Silent Mode
invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is set to the (invalid) option character
----------------------------- -----------------------------------------------------------------------------------------------
required argument not found `VARNAME` is set to `:` (colon) and `OPTARG` contains the option-character in question
## Using it
### A first example
Enough said - action!
Let\'s play with a very simple case: only one option (`-a`) expected,
without any arguments. Also we disable the *verbose error handling* by
preceding the whole option string with a colon (`:`):
``` bash
#!/bin/bash
while getopts ":a" opt; do
case $opt in
a)
echo "-a was triggered!" >&2
;;
\?)
echo "Invalid option: -$OPTARG" >&2
;;
esac
done
```
I put that into a file named `go_test.sh`, which is the name you\'ll see
below in the examples.
Let\'s do some tests:
#### Calling it without any arguments
$ ./go_test.sh
$
Nothing happened? Right. `getopts` didn\'t see any valid or invalid
options (letters preceded by a dash), so it wasn\'t triggered.
#### Calling it with non-option arguments
$ ./go_test.sh /etc/passwd
$
Again \-\-- nothing happened. The **very same** case: `getopts` didn\'t
see any valid or invalid options (letters preceded by a dash), so it
wasn\'t triggered.
The arguments given to your script are of course accessible as `$1` -
`${N}`.
#### Calling it with option-arguments
Now let\'s trigger `getopts`: Provide options.
First, an **invalid** one:
$ ./go_test.sh -b
Invalid option: -b
$
As expected, `getopts` didn\'t accept this option and acted like told
above: It placed `?` into `$opt` and the invalid option character (`b`)
into `$OPTARG`. With our `case` statement, we were able to detect this.
Now, a **valid** one (`-a`):
$ ./go_test.sh -a
-a was triggered!
$
You see, the detection works perfectly. The `a` was put into the
variable `$opt` for our case statement.
Of course it\'s possible to **mix valid and invalid** options when
calling:
$ ./go_test.sh -a -x -b -c
-a was triggered!
Invalid option: -x
Invalid option: -b
Invalid option: -c
$
Finally, it\'s of course possible, to give our option **multiple
times**:
$ ./go_test.sh -a -a -a -a
-a was triggered!
-a was triggered!
-a was triggered!
-a was triggered!
$
The last examples lead us to some points you may consider:
- **invalid options don\'t stop the processing**: If you want to stop
the script, you have to do it yourself (`exit` in the right place)
- **multiple identical options are possible**: If you want to disallow
these, you have to check manually (e.g. by setting a variable or so)
### An option with argument
Let\'s extend our example from above. Just a little bit:
- `-a` now takes an argument
- on an error, the parsing exits with `exit 1`
``` bash
#!/bin/bash
while getopts ":a:" opt; do
case $opt in
a)
echo "-a was triggered, Parameter: $OPTARG" >&2
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
```
Let\'s do the very same tests we did in the last example:
#### Calling it without any arguments
$ ./go_test.sh
$
As above, nothing happened. It wasn\'t triggered.
#### Calling it with non-option arguments
$ ./go_test.sh /etc/passwd
$
The **very same** case: It wasn\'t triggered.
#### Calling it with option-arguments
**Invalid** option:
$ ./go_test.sh -b
Invalid option: -b
$
As expected, as above, `getopts` didn\'t accept this option and acted
like programmed.
**Valid** option, but without the mandatory **argument**:
$ ./go_test.sh -a
Option -a requires an argument.
$
The option was okay, but there is an argument missing.
Let\'s provide **the argument**:
$ ./go_test.sh -a /etc/passwd
-a was triggered, Parameter: /etc/passwd
$
## See also
- Internal: [posparams](/scripting/posparams)
- Internal: [case](/syntax/ccmd/case)
- Internal: [while_loop](/syntax/ccmd/while_loop)
- POSIX
[getopts(1)](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html#tag_20_54)
and
[getopt(3)](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html)
- [parse CLI
ARGV](https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash)
- [handle command-line arguments (options) to a
script](http://mywiki.wooledge.org/BashFAQ/035)

207
docs/howto/mutex.md Normal file
View File

@ -0,0 +1,207 @@
# Lock your script (against parallel execution)
![](keywords>bash shell scripting mutex locking run-control)
## Why lock?
Sometimes there\'s a need to ensure only one copy of a script runs, i.e
prevent two or more copies running simultaneously. Imagine an important
cronjob doing something very important, which will fail or corrupt data
if two copies of the called program were to run at the same time. To
prevent this, a form of `MUTEX` (**mutual exclusion**) lock is needed.
The basic procedure is simple: The script checks if a specific condition
(locking) is present at startup, if yes, it\'s locked - the scipt
doesn\'t start.
This article describes locking with common UNIX(r) tools. There are
other special locking tools available, But they\'re not standardized, or
worse yet, you can\'t be sure they\'re present when you want to run your
scripts. **A tool designed for specifically for this purpose does the
job much better than general purpose code.**
### Other, special locking tools
As told above, a special tool for locking is the preferred solution.
Race conditions are avoided, as is the need to work around specific
limits.
- `flock`: <http://www.kernel.org/pub/software/utils/script/flock/>
- `solo`: <http://timkay.com/solo/>
## Choose the locking method
The best way to set a global lock condition is the UNIX(r) filesystem.
Variables aren\'t enough, as each process has its own private variable
space, but the filesystem is global to all processes (yes, I know about
chroots, namespaces, \... special case). You can \"set\" several things
in the filesystem that can be used as locking indicator:
- create files
- update file timestamps
- create directories
To create a file or set a file timestamp, usually the command touch is
used. The following problem is implied: A locking mechanism checks for
the existance of the lockfile, if no lockfile exists, it creates one and
continues. Those are **two separate steps**! That means it\'s **not an
atomic operation**. There\'s a small amount of time between checking and
creating, where another instance of the same script could perform
locking (because when it checked, the lockfile wasn\'t there)! In that
case you would have 2 instances of the script running, both thinking
they are succesfully locked, and can operate without colliding. Setting
the timestamp is similar: One step to check the timespamp, a second step
to set the timestamp.
\<WRAP center round tip 60%\> [**Conclusion:**]{.underline} We need an
operation that does the check and the locking in one step. \</WRAP\>
A simple way to get that is to create a **lock directory** - with the
mkdir command. It will:
* create a given directory only if it does not exist, and set a successful exit code
* it will set an unsuccesful exit code if an error occours - for example, if the directory specified already exists
With mkdir it seems, we have our two steps in one simple operation. A
(very!) simple locking code might look like this:
``` bash
if mkdir /var/lock/mylock; then
echo "Locking succeeded" >&2
else
echo "Lock failed - exit" >&2
exit 1
fi
```
In case `mkdir` reports an error, the script will exit at this point -
**the MUTEX did its job!**
*If the directory is removed after setting a successful lock, while the
script is still running, the lock is lost. Doing chmod -w for the parent
directory containing the lock directory can be done, but it is not
atomic. Maybe a while loop checking continously for the existence of the
lock in the background and sending a signal such as USR1, if the
directory is not found, can be done. The signal would need to be
trapped. I am sure there there is a better solution than this
suggestion* \-\-- *[sn18](sunny_delhi18@yahoo.com) 2009/12/19 08:24*
**Note:** While perusing the Internet, I found some people asking if the
`mkdir` method works \"on all filesystems\". Well, let\'s say it should.
The syscall under `mkdir` is guarenteed to work atomicly in all cases,
at least on Unices. Two examples of problems are NFS filesystems and
filesystems on cluster servers. With those two scenarios, dependencies
exist related to the mount options and implementation. However, I
successfully use this simple method on an Oracle OCFS2 filesystem in a
4-node cluster environment. So let\'s just say \"it should work under
normal conditions\".
Another atomic method is setting the `noclobber` shell option
(`set -C`). That will cause redirection to fail, if the file the
redirection points to already exists (using diverse `open()` methods).
Need to write a code example here.
``` bash
if ( set -o noclobber; echo "locked" > "$lockfile") 2> /dev/null; then
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
echo "Locking succeeded" >&2
rm -f "$lockfile"
else
echo "Lock failed - exit" >&2
exit 1
fi
```
Another explanation of this basic pattern using `set -C` can be found
[here](http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_07).
## An example
This code was taken from a production grade script that controls PISG to
create statistical pages from my IRC logfiles. There are some
differences compared to the very simple example above:
- the locking stores the process ID of the locked instance
- if a lock fails, the script tries to find out if the locked instance
still is active (unreliable!)
- traps are created to automatically remove the lock when the script
terminates, or is killed
Details on how the script is killed aren\'t given, only code relevant to
the locking process is shown:
``` bash
#!/bin/bash
# lock dirs/files
LOCKDIR="/tmp/statsgen-lock"
PIDFILE="${LOCKDIR}/PID"
# exit codes and text
ENO_SUCCESS=0; ETXT[0]="ENO_SUCCESS"
ENO_GENERAL=1; ETXT[1]="ENO_GENERAL"
ENO_LOCKFAIL=2; ETXT[2]="ENO_LOCKFAIL"
ENO_RECVSIG=3; ETXT[3]="ENO_RECVSIG"
###
### start locking attempt
###
trap 'ECODE=$?; echo "[statsgen] Exit: ${ETXT[ECODE]}($ECODE)" >&2' 0
echo -n "[statsgen] Locking: " >&2
if mkdir "${LOCKDIR}" &>/dev/null; then
# lock succeeded, install signal handlers before storing the PID just in case
# storing the PID fails
trap 'ECODE=$?;
echo "[statsgen] Removing lock. Exit: ${ETXT[ECODE]}($ECODE)" >&2
rm -rf "${LOCKDIR}"' 0
echo "$$" >"${PIDFILE}"
# the following handler will exit the script upon receiving these signals
# the trap on "0" (EXIT) from above will be triggered by this trap's "exit" command!
trap 'echo "[statsgen] Killed by a signal." >&2
exit ${ENO_RECVSIG}' 1 2 3 15
echo "success, installed signal handlers"
else
# lock failed, check if the other PID is alive
OTHERPID="$(cat "${PIDFILE}")"
# if cat isn't able to read the file, another instance is probably
# about to remove the lock -- exit, we're *still* locked
# Thanks to Grzegorz Wierzowiecki for pointing out this race condition on
# http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash
if [ $? != 0 ]; then
echo "lock failed, PID ${OTHERPID} is active" >&2
exit ${ENO_LOCKFAIL}
fi
if ! kill -0 $OTHERPID &>/dev/null; then
# lock is stale, remove it and restart
echo "removing stale lock of nonexistant PID ${OTHERPID}" >&2
rm -rf "${LOCKDIR}"
echo "[statsgen] restarting myself" >&2
exec "$0" "$@"
else
# lock is valid and OTHERPID is active - exit, we're locked!
echo "lock failed, PID ${OTHERPID} is active" >&2
exit ${ENO_LOCKFAIL}
fi
fi
```
## Related links
- [BashFAQ/045](http://mywiki.wooledge.org/BashFAQ/045)
- [Implementation of a shell locking
utility](http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash)
- [Wikipedia article on File
Locking](http://en.wikipedia.org/wiki/File_locking), including a
discussion of potential
[problems](http://en.wikipedia.org/wiki/File_locking#Problems) with
flock and certain versions of NFS.

353
docs/howto/pax.md Normal file
View File

@ -0,0 +1,353 @@
# pax - the POSIX archiver
![](keywords>bash shell scripting POSIX archive tar packing zip)
pax can do a lot of fancy stuff, feel free to contribute more awesome
pax tricks!
## Introduction
The POSIX archiver, `pax`, is an attempt at a standardized archiver with
the best features of `tar` and `cpio`, able to handle all common archive
types.
However, this is **not a manpage**, it will **not** list all possible
options, it will **not** you detailed information about `pax`. It\'s
only an introduction.
This article is based on the debianized Berkeley implementation of
`pax`, but implementation-specific things should be tagged as such.
Unfortunately, the Debian package doesn\'t seem to be maintained
anymore.
## Overview
### Operation modes
There are four basic operation modes to *list*, *read*, *write* and
*copy* archives. They\'re switched with combinations of `-r` and `-w`
command line options:
Mode RW-Options
------- -----------------
List *no RW-options*
Read `-r`
Write `-w`
Copy `-r -w`
#### List
In *list mode*, `pax` writes the list of archive members to standard
output (a table of contents). If a pattern match is specified on the
command line, only matching filenames are printed.
#### Read
*Read* an archive. `pax` will read archive data and extract the members
to the current directory. If a pattern match is specified on the command
line, only matching filenames are extracted.
When reading an archive, the archive type is determined from the archive
data.
#### Write
*Write* an archive, which means create a new one or append to an
existing one. All files and directories specified on the command line
are inserted into the archive. The archive is written to standard output
by default.
If no files are specified on the command line, filenames are read from
`STDIN`.
The write mode is the only mode where you need to specify the archive
type with `-x <TYPE>`, e.g. `-x ustar`.
#### Copy
*Copy* mode is similar to `cpio` passthrough mode. It provides a way to
replicate a complete or partial file hierarchy (with all the `pax`
options, e.g. rewriting groups) to another location.
### Archive data
When you don\'t specify anything special, `pax` will attempt to read
archive data from standard input (read/list modes) and write archive
data to standard output (write mode). This ensures `pax` can be easily
used as part of a shell pipe construct, e.g. to read a compressed
archive that\'s decompressed in the pipe.
The option to specify the pathname of a file to be archived is `-f` This
file will be used as input or output, depending on the operation
(read/write/list).
When pax reads an archive, it tries to guess the archive type. However,
in *write* mode, you must specify which type of archive to append using
the `-x <TYPE>` switch. If you omit this switch, a default archive will
be created (POSIX says it\'s implementation defined, Berkeley `pax`
creates `ustar` if no options are specified).
The following archive formats are supported (Berkeley implementation):
--------- ----------------------------
ustar POSIX TAR format (default)
cpio POSIX CPIO format
tar classic BSD TAR format
bcpio old binary CPIO format
sv4cpio SVR4 CPIO format
sv4crc SVR4 CPIO format with CRC
--------- ----------------------------
Berkeley `pax` supports options `-z` and `-j`, similar to GNU `tar`, to
filter archive files through GZIP/BZIP2.
### Matching archive members
In *read* and *list* modes, you can specify patterns to determine which
files to list or extract.
- the pattern notation is the one known by a POSIX-shell, i.e. the one
known by Bash without `extglob`
- if the specified pattern matches a complete directory, it affects
all files and subdirectories of the specified directory
- if you specify the `-c` option, `pax` will invert the matches, i.e.
it matches all filenames **except** those matching the specified
patterns
- if no patterns are given, `pax` will \"match\" (list or extract) all
files from the archive
- **To avoid conflicts with shell pathname expansion, it\'s wise to
quote patterns!**
#### Some assorted examples of patterns
pax -r <myarchive.tar 'data/sales/*.txt' 'data/products/*.png'
pax -r <myarchive.tar 'data/sales/year_200[135].txt'
# should be equivalent to
pax -r <myarchive.tar 'data/sales/year_2001.txt' 'data/sales/year_2003.txt' 'data/sales/year_2005.txt'
## Using pax
This is a brief description of using `pax` as a normal archiver system,
like you would use `tar`.
### Creating an archive
This task is done with basic syntax
# archive contents to stdout
pax -w >archive.tar README.txt *.png data/
# equivalent, extract archive contents directly to a file
pax -w -x ustar -f archive.tar README.txt *.png data/
`pax` is in *write* mode, the given filenames are packed into an
archive:
- `README.txt` is a normal file, it will be packed
- `*.png` is a pathname glob **for your shell**, the shell will
substitute all matching filenames **before** `pax` is executed. The
result is a list of filenames that will be packed like the
`README.txt` example above
- `data/` is a directory. **Everything** in this directory will be
packed into the archive, i.e. not just an empty directory
When you specify the `-v` option, `pax` will write the pathnames of the
files inserted into the archive to `STDERR`.
When, and only when, no filename arguments are specified, `pax` attempts
to read filenames from `STDIN`, separated by newlines. This way you can
easily combine `find` with `pax`:
find . -name '*.txt' | pax -wf textfiles.tar -x ustar
### Listing archive contents
The standard output format to list archive members simply is to print
each filename to a separate line. But the output format can be
customized to include permissions, timestamps, etc. with the
`-o listopt=<FORMAT>` specification. The syntax of the format
specification is strongly derived from the `printf(3)` format
specification.
**Unfortunately** the `pax` utility delivered with Debian doesn\'t seem
to support these extended listing formats.
However, `pax` lists archive members in a `ls -l`-like format, when you
give the `-v` option:
pax -v <myarchive.tar
# or, of course
pax -vf myarchive.tar
### Extracting from an archive
You can extract all files, or files (not) matching specific patterns
from an archive using constructs like:
# "normal" extraction
pax -rf myarchive.tar '*.txt'
# with inverted pattern
pax -rf myarchive.tar -c '*.txt'
### Copying files
To copy directory contents to another directory, similar to a `cp -a`
command, use:
mkdir destdir
pax -rw dir destdir #creates a copy of dir in destdir/, i.e. destdir/dir
### Copying files via ssh
To copy directory contents to another directory on a remote system, use:
pax -w localdir | ssh user@host "cd distantdest && pax -r -v"
pax -w localdir | gzip | ssh user@host "cd distantdir && gunzip | pax -r -v" #compress the sent data
These commands create a copy of localdir in distandir (distantdir/dir)
on the remote machine.
## Advanced usage
### Backup your daily work
[**Note:**]{.underline} `-T` is an extension and is not defined by
POSIX.
Say you have write-access to a fileserver mounted on your filesystem
tree. In *copy* mode, you can tell `pax` to copy only files that were
modified today:
mkdir /n/mybackups/$(date +%A)/
pax -rw -T 0000 data/ /n/mybackups/$(date +%A)/
This is done using the `-T` switch, which normally allows you to specify
a time window, but in this case, only the start time which means \"today
at midnight\".
When you execute this \"very simple backup\" after your daily work, you
will have a copy of the modified files.
[**Note:**]{.underline} The `%A` format from `date` expands to the name
of the current day, localized, e.g. \"Friday\" (en) or \"Mittwoch\"
(de).
The same, but with an archive, can be accomplished by:
pax -w -T 0000 -f /n/mybackups/$(date +%A)
In this case, the day-name is an archive-file (you don\'t need a
filename extension like `.tar` but you can add one, if desired).
### Changing filenames while archiving
`pax` is able to rewrite filenames while archiving or while extracting
from an archive. This example creates a tar archive containing the
`holiday_2007/` directory, but the directory name inside the archive
will be `holiday_pics/`:
pax -x ustar -w -f holiday_pictures.tar -s '/^holiday_2007/holiday_pics/' holiday_2007/
The option responsible for the string manipulation is the
`-s <REWRITE-SPECIFICATION>`. It takes the string rewrite specification
as an argument, in the form `/OLD/NEW/[gp]`, which is an `ed(1)`-like
regular expression (BRE) for `old` and generally can be used like the
popular sed construct `s/from/to/`. Any non-null character can be used
as a delimiter, so to mangle pathnames (containing slashes), you could
use `#/old/path#/new/path#`.
The optional `g` and `p` flags are used to apply substitution
**(g)**lobally to the line or to **(p)**rint the original and rewritten
strings to `STDERR`.
Multiple `-s` options can be specified on the command line. They are
applied to the pathname strings of the files or archive members. This
happens in the order they are specified.
### Excluding files from an archive
The -s command seen above can be used to exclude a file. The
substitution must result in a null string: For example, let\'s say that
you want to exclude all the CVS directories to create a source code
archive. We are going to replace the names containing /CVS/ with
nothing, note the .\* they are needed because we need to match the
entire pathname.
pax -w -x ustar -f release.tar -s',.*/CVS/.*,,' myapplication
You can use several -s options, for instance, let\'s say you also want
to remove files ending in \~:
pax -w -x ustar -f release.tar -'s,.*/CVS/.*,,' -'s/.*~//' myapplication
This can also be done while reading an archive, for instance, suppose
you have an archive containing a \"usr\" and a \"etc\" directory but
that you want to extract only the \"usr\" directory:
pax -r -f archive.tar -s',^etc/.*,,' #the etc/ dir is not extracted
### Getting archive filenames from STDIN
Like `cpio`, pax can read filenames from standard input (`stdin`). This
provides great flexibility - for example, a `find(1)` command may select
files/directories in ways pax can\'t do itself. In **write** mode
(creating an archive) or **copy** mode, when no filenames are given, pax
expects to read filenames from standard input. For example:
# Back up config files changed less than 3 days ago
find /etc -type f -mtime -3 | pax -x ustar -w -f /backups/etc.tar
# Copy only the directories, not the files
mkdir /target
find . -type d -print | pax -r -w -d /target
# Back up anything that changed since the last backup
find . -newer /var/run/mylastbackup -print0 |
pax -0 -x ustar -w -d -f /backups/mybackup.tar
touch /var/run/mylastbackup
The `-d` option tells pax `not` to recurse into directories it reads
(`cpio`-style). Without `-d`, pax recurses into all directories
(`tar`-style).
**Note**: the `-0` option is not standard, but is present in some
implementations.
## From tar to pax
`pax` can handle the `tar` archive format, if you want to switch to the
standard tool an alias like:
alias tar='echo USE PAX, idiot. pax is the standard archiver!; # '
in your `~/.bashrc` can be useful :-D.
Here is a quick table comparing (GNU) `tar` and `pax` to help you to
make the switch:
TAR PAX Notes
------------------------------------- ------------------------------------------ -----------------------------------------------------------------------
`tar xzvf file.tar.gz` `pax -rvz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip <file.tar.gz | pax -rv`
`tar czvf archive.tar.gz path ...` `pax -wvz -f archive.tar.gz path ...` `-z` is an extension, POSIXly: `pax -wv path | gzip > archive.tar.gz`
`tar xjvf file.tar.bz2` `bunzip2 <file.tar.bz2 | pax -rv`
`tar cjvf archive.tar.bz2 path ...` `pax -wv path | bzip2 > archive.tar.bz2`
`tar tzvf file.tar.gz` `pax -vz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip <file.tar.gz | pax -v`
`pax` might not create ustar (`tar`) archives by default but its own pax
format, add `-x ustar` if you want to ensure pax creates tar archives!
## Implementations
- [AT&T AST toolkit](http://www2.research.att.com/sw/download/) \|
[manpage](http://www2.research.att.com/~gsf/man/man1/pax.html)
- [Heirloom toolchest](http://heirloom.sourceforge.net/index.html) \|
[manpage](http://heirloom.sourceforge.net/man/pax.1.html)
- [OpenBSD pax](http://www.openbsd.org/cgi-bin/cvsweb/src/bin/pax/) \|
[manpage](http://www.openbsd.org/cgi-bin/man.cgi?query=pax&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html)
- [MirBSD pax](https://launchpad.net/paxmirabilis) \|
[manpage](https://www.mirbsd.org/htman/i386/man1/pax.htm) - Debian
bases their package upon this.
- [SUS pax
specification](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html)

View File

@ -0,0 +1,697 @@
# Illustrated Redirection Tutorial
![](keywords>bash shell scripting tutorial redirection redirect file descriptor)
This tutorial is not a complete guide to redirection, it will not cover
here docs, here strings, name pipes etc\... I just hope it\'ll help you
to understand what things like `3>&2`, `2>&1` or `1>&3-` do.
# stdin, stdout, stderr
When Bash starts, normally, 3 file descriptors are opened, `0`, `1` and
`2` also known as standard input (`stdin`), standard output (`stdout`)
and standard error (`stderr`).
For example, with Bash running in a Linux terminal emulator, you\'ll
see:
# lsof +f g -ap $BASHPID -d 0,1,2
COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME
bash 12135 root 0u CHR RW,LG 136,13 0t0 16 /dev/pts/5
bash 12135 root 1u CHR RW,LG 136,13 0t0 16 /dev/pts/5
bash 12135 root 2u CHR RW,LG 136,13 0t0 16 /dev/pts/5
This `/dev/pts/5` is a pseudo terminal used to emulate a real terminal.
Bash reads (`stdin`) from this terminal and prints via `stdout` and
`stderr` to this terminal.
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
When a command, a compound command, a subshell etc. is executed, it
inherits these file descriptors. For instance `echo foo` will send the
text `foo` to the file descriptor `1` inherited from the shell, which is
connected to `/dev/pts/5`.
# Simple Redirections
## Output Redirection \"n\> file\"
`>` is probably the simplest redirection.
`echo foo > file`
the `> file` after the command alters the file descriptors belonging to
the command `echo`. It changes the file descriptor `1` (`> file` is the
same as `1>file`) so that it points to the file `file`. They will look
like:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| file |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
Now characters written by our command, `echo`, that are sent to the
standard output, i.e., the file descriptor `1`, end up in the file named
`file`.
In the same way, command `2> file` will change the standard error and
will make it point to `file`. Standard error is used by applications to
print errors.
What will `command 3> file` do? It will open a new file descriptor
pointing to `file`. The command will then start with:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
new descriptor ( 3 ) ---->| file |
--- +-----------------------+
What will the command do with this descriptor? It depends. Often
nothing. We will see later why we might want other file descriptors.
## Input Redirection \"n\< file\"
When you run a commandusing `command < file`, it changes the file
descriptor `0` so that it looks like:
--- +-----------------------+
standard input ( 0 ) <----| file |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
If the command reads from `stdin`, it now will read from `file` and not
from the console.
As with `>`, `<` can be used to open a new file descriptor for reading,
`command 3<file`. Later we will see how this can be useful.
## Pipes \|
What does this `|` do? Among other things, it connects the standard
output of the command on the left to the standard input of the command
on the right. That is, it creates a special file, a pipe, which is
opened as a write destinaton for the left command, and as a read source
for the right command.
echo foo | cat
--- +--------------+ --- +--------------+
( 0 ) ---->| /dev/pts/5 | ------> ( 0 ) ---->|pipe (read) |
--- +--------------+ / --- +--------------+
/
--- +--------------+ / --- +--------------+
( 1 ) ---->| pipe (write) | / ( 1 ) ---->| /dev/pts |
--- +--------------+ --- +--------------+
--- +--------------+ --- +--------------+
( 2 ) ---->| /dev/pts/5 | ( 2 ) ---->| /dev/pts/ |
--- +--------------+ --- +--------------+
This is possible because the redirections are set up by the shell
**before** the commands are executed, and the commands inherit the file
descriptors.
# More On File Descriptors
## Duplicating File Descriptor 2\>&1
We have seen how to open (or redirect) file descriptors. Let us see how
to duplicate them, starting with the classic `2>&1`. What does this
mean? That something written on the file descriptor `2` will go where
file descriptor `1` goes. In a shell `command 2>&1` is not a very
interesting example so we will use `ls /tmp/ doesnotexist 2>&1 | less`
ls /tmp/ doesnotexist 2>&1 | less
--- +--------------+ --- +--------------+
( 0 ) ---->| /dev/pts/5 | ------> ( 0 ) ---->|from the pipe |
--- +--------------+ / ---> --- +--------------+
/ /
--- +--------------+ / / --- +--------------+
( 1 ) ---->| to the pipe | / / ( 1 ) ---->| /dev/pts |
--- +--------------+ / --- +--------------+
/
--- +--------------+ / --- +--------------+
( 2 ) ---->| to the pipe | / ( 2 ) ---->| /dev/pts/ |
--- +--------------+ --- +--------------+
Why is it called *duplicating*? Because after `2>&1`, we have 2 file
descriptors pointing to the same file. Take care not to call this \"File
Descriptor Aliasing\"; if we redirect `stdout` after `2>&1` to a file
`B`, file descriptor `2` will still be opened on the file `A` where it
was. This is often misunderstood by people wanting to redirect both
standard input and standard output to the file. Continue reading for
more on this.
So if you have two file descriptors `s` and `t` like:
--- +-----------------------+
a descriptor ( s ) ---->| /some/file |
--- +-----------------------+
--- +-----------------------+
a descriptor ( t ) ---->| /another/file |
--- +-----------------------+
Using a `t>&s` (where `t` and `s` are numbers) it means:
> Copy whatever file descriptor `s` contains into file descriptor `t`
So you got a copy of this descriptor:
--- +-----------------------+
a descriptor ( s ) ---->| /some/file |
--- +-----------------------+
--- +-----------------------+
a descriptor ( t ) ---->| /some/file |
--- +-----------------------+
Internally each of these is represented by a file descriptor opened by
the operating system\'s `fopen` calls, and is likely just a pointer to
the file which has been opened for reading (`stdin` or file descriptor
`0`) or writing (`stdout` /`stderr`).
Note that the file reading or writing positions are also duplicated. If
you have already read a line of `s`, then after `t>&s` if you read a
line from `t`, you will get the second line of the file.
Similarly for output file descriptors, writing a line to file descriptor
`s` will append a line to a file as will writing a line to file
descriptor `t`.
\<note tip\>The syntax is somewhat confusing in that you would think
that the arrow would point in the direction of the copy, but it\'s
reversed. So it\'s `target>&source` effectively.\</note\>
So, as a simple example (albeit slightly contrived), is the following:
exec 3>&1 # Copy 1 into 3
exec 1> logfile # Make 1 opened to write to logfile
lotsa_stdout # Outputs to fd 1, which writes to logfile
exec 1>&3 # Copy 3 back into 1
echo Done # Output to original stdout
## Order Of Redirection, i.e., \"\> file 2\>&1\" vs. \"2\>&1 \>file\"
While it doesn\'t matter where the redirections appears on the command
line, their order does matter. They are set up from left to right.
- `2>&1 >file`
A common error, is to do `command 2>&1 > file` to redirect both `stderr`
and `stdout` to `file`. Let\'s see what\'s going on. First we type the
command in our terminal, the descriptors look like this:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
Then our shell, Bash sees `2>&1` so it duplicates 1, and the file
descriptor look like this:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
That\'s right, nothing has changed, 2 was already pointing to the same
place as 1. Now Bash sees `> file` and thus changes `stdout`:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| file |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
And that\'s not what we want.
- `>file 2>&1`
Now let\'s look at the correct `command >file 2>&1`. We start as in the
previous example, and Bash sees `> file`:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| file |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
Then it sees our duplication `2>&1`:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| file |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| file |
--- +-----------------------+
And voila, both `1` and `2` are redirected to file.
## Why sed \'s/foo/bar/\' file \>file Doesn\'t Work
This is a common error, we want to modify a file using something that
reads from a file and writes the result to `stdout`. To do this, we
redirect stdout to the file we want to modify. The problem here is that,
as we have seen, the redirections are setup before the command is
actually executed.
So **BEFORE** sed starts, standard output has already been redirected,
with the additional side effect that, because we used \>, \"file\" gets
truncated. When `sed` starts to read the file, it contains nothing.
## exec
In Bash the `exec` built-in replaces the shell with the specified
program. So what does this have to do with redirection? `exec` also
allow us to manipulate the file descriptors. If you don\'t specify a
program, the redirection after `exec` modifies the file descriptors of
the current shell.
For example, all the commands after `exec 2>file` will have file
descriptors like:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| file |
--- +-----------------------+
All the the errors sent to `stderr` by the commands after the
`exec 2>file` will go to the file, just as if you had the command in a
script and ran `myscript 2>file`.
`exec` can be used, if, for instance, you want to log the errors the
commands in your script produce, just add `exec 2>myscript.errors` at
the beginning of your script.
Let\'s see another use case. We want to read a file line by line, this
is easy, we just do:
while read -r line;do echo "$line";done < file
Now, we want, after printing each line, to do a pause, waiting for the
user to press a key:
while read -r line;do echo "$line"; read -p "Press any key" -n 1;done < file
And, surprise this doesn\'t work. Why? because the shell descriptor of
the while loop looks like:
--- +-----------------------+
standard input ( 0 ) ---->| file |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
and our read inherits these descriptors, and our command
(`read -p "Press any key" -n 1`) inherits them, and thus reads from file
and not from our terminal.
A quick look at `help read` tells us that we can specify a file
descriptor from which `read` should read. Cool. Now let\'s use `exec` to
get another descriptor:
exec 3<file
while read -u 3 line;do echo "$line"; read -p "Press any key" -n 1;done
Now the file descriptors look like:
--- +-----------------------+
standard input ( 0 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard output ( 1 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
standard error ( 2 ) ---->| /dev/pts/5 |
--- +-----------------------+
--- +-----------------------+
new descriptor ( 3 ) ---->| file |
--- +-----------------------+
and it works.
## Closing The File Descriptors
Closing a file through a file descriptor is easy, just make it a
duplicate of -. For instance, let\'s close `stdin <&-` and
`stderr 2>&-`:
bash -c '{ lsof -a -p $$ -d0,1,2 ;} <&- 2>&-'
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bash 10668 pgas 1u CHR 136,2 4 /dev/pts/2
we see that inside the `{}` that only `1` is still here.
Though the OS will probably clean up the mess, it is perhaps a good idea
to close the file descriptors you open. For instance, if you open a file
descriptor with `exec 3>file`, all the commands afterwards will inherit
it. It\'s probably better to do something like:
exec 3>file
.....
#commands that uses 3
.....
exec 3>&-
#we don't need 3 any more
I\'ve seen some people using this as a way to discard, say stderr, using
something like: command 2\>&-. Though it might work, I\'m not sure if
you can expect all applications to behave correctly with a closed
stderr.
When in doubt, I use 2\>/dev/null.
# An Example
This example comes from [this post
(ffe4c2e382034ed9)](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9)
on the comp.unix.shell group:
{
{
cmd1 3>&- |
cmd2 2>&3 3>&-
} 2>&1 >&4 4>&- |
cmd3 3>&- 4>&-
} 3>&2 4>&1
The redirections are processed from left to right, but as the file
descriptors are inherited we will also have to work from the outer to
the inner contexts. We will assume that we run this command in a
terminal. Let\'s start with the outer `{ } 3>&2 4>&1`.
--- +-------------+ --- +-------------+
( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 |
--- +-------------+ --- +-------------+
--- +-------------+ --- +-------------+
( 1 ) ---->| /dev/pts/5 | ( 4 ) ---->| /dev/pts/5 |
--- +-------------+ --- +-------------+
--- +-------------+
( 2 ) ---->| /dev/pts/5 |
--- +-------------+
We only made 2 copies of `stderr` and `stdout`. `3>&1 4>&1` would have
produced the same result here because we ran the command in a terminal
and thus `1` and `2` go to the terminal. As an exercise, you can start
with `1` pointing to `file.stdout` and 2 pointing to `file.stderr`, you
will see why these redirections are very nice.
Let\'s continue with the right part of the second pipe:
`| cmd3 3>&- 4>&-`
--- +-------------+
( 0 ) ---->| 2nd pipe |
--- +-------------+
--- +-------------+
( 1 ) ---->| /dev/pts/5 |
--- +-------------+
--- +-------------+
( 2 ) ---->| /dev/pts/5 |
--- +-------------+
It inherits the previous file descriptors, closes 3 and 4 and sets up a
pipe for reading. Now for the left part of the second pipe
`{...} 2>&1 >&4 4>&- |`
--- +-------------+ --- +-------------+
( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 |
--- +-------------+ --- +-------------+
--- +-------------+
( 1 ) ---->| /dev/pts/5 |
--- +-------------+
--- +-------------+
( 2 ) ---->| 2nd pipe |
--- +-------------+
First, The file descriptor `1` is connected to the pipe (`|`), then `2`
is made a copy of `1` and thus is made an fd to the pipe (`2>&1`), then
`1` is made a copy of `4` (`>&4`), then `4` is closed. These are the
file descriptors of the inner `{}`. Lcet\'s go inside and have a look at
the right part of the first pipe: `| cmd2 2>&3 3>&-`
--- +-------------+
( 0 ) ---->| 1st pipe |
--- +-------------+
--- +-------------+
( 1 ) ---->| /dev/pts/5 |
--- +-------------+
--- +-------------+
( 2 ) ---->| /dev/pts/5 |
--- +-------------+
It inherits the previous file descriptors, connects 0 to the 1st pipe,
the file descriptor 2 is made a copy of 3, and 3 is closed. Finally, for
the left part of the pipe:
--- +-------------+
( 0 ) ---->| /dev/pts/5 |
--- +-------------+
--- +-------------+
( 1 ) ---->| 1st pipe |
--- +-------------+
--- +-------------+
( 2 ) ---->| 2nd pipe |
--- +-------------+
It also inherits the file descriptor of the left part of the 2nd pipe,
file descriptor `1` is connected to the first pipe, `3` is closed.
The purpose of all this becomes clear if we take only the commands:
cmd2
--- +-------------+
-->( 0 ) ---->| 1st pipe |
/ --- +-------------+
/
/ --- +-------------+
cmd 1 / ( 1 ) ---->| /dev/pts/5 |
/ --- +-------------+
/
--- +-------------+ / --- +-------------+
( 0 ) ---->| /dev/pts/5 | / ( 2 ) ---->| /dev/pts/5 |
--- +-------------+ / --- +-------------+
/
--- +-------------+ / cmd3
( 1 ) ---->| 1st pipe | /
--- +-------------+ --- +-------------+
------------>( 0 ) ---->| 2nd pipe |
--- +-------------+ / --- +-------------+
( 2 ) ---->| 2nd pipe |/
--- +-------------+ --- +-------------+
( 1 ) ---->| /dev/pts/5 |
--- +-------------+
--- +-------------+
( 2 ) ---->| /dev/pts/5 |
--- +-------------+
As said previously, as an exercise, you can start with `1` open on a
file and `2` open on another file to see how the `stdin` from `cmd2` and
`cmd3` goes to the original `stdin` and how the `stderr` goes to the
original `stderr`.
# Syntax
I used to have trouble choosing between `0&<3` `3&>1` `3>&1` `->2`
`-<&0` `&-<0` `0<&-` etc\... (I think probably because the syntax is
more representative of the result, i.e., the redirection, than what is
done, i.e., opening, closing, or duplicating file descriptors).
If this fits your situation, then maybe the following \"rules\" will
help you, a redirection is always like the following:
lhs op rhs
- `lhs` is always a file description, i.e., a number:
- Either we want to open, duplicate, move or we want to close. If
the op is `<` then there is an implicit 0, if it\'s `>` or `>>`,
there is an implicit 1.
```{=html}
<!-- -->
```
- `op` is `<`, `>`, `>>`, `>|`, or `<>`:
- `<` if the file decriptor in `lhs` will be read, `>` if it will
be written, `>>` if data is to be appended to the file, `>|` to
overwrite an existing file or `<>` if it will be both read and
written.
```{=html}
<!-- -->
```
- `rhs` is the thing that the file descriptor will describe:
- It can be the name of a file, the place where another descriptor
goes (`&1`), or, `&-`, which will close the file descriptor.
You might not like this description, and find it a bit incomplete or
inexact, but I think it really helps to easily find that, say `&->0` is
incorrect.
### A note on style
The shell is pretty loose about what it considers a valid redirect.
While opinions probably differ, this author has some (strong)
recommendations:
- **Always** keep redirections \"tightly grouped\" \-- that is, **do
not** include whitespace anywhere within the redirection syntax
except within quotes if required on the RHS (e.g. a filename that
contains a space). Since shells fundamentally use whitespace to
delimit fields in general, it is visually much clearer for each
redirection to be separated by whitespace, but grouped in chunks
that contain no unnecessary whitespace.
```{=html}
<!-- -->
```
- **Do** always put a space between each redirection, and between the
argument list and the first redirect.
```{=html}
<!-- -->
```
- **Always** place redirections together at the very end of a command
after all arguments. Never precede a command with a redirect. Never
put a redirect in the middle of the arguments.
```{=html}
<!-- -->
```
- **Never** use the Csh `&>foo` and `>&foo` shorthand redirects. Use
the long form `>foo 2>&1`. (see: [obsolete](obsolete))
```{=html}
<!-- -->
```
# Good! This is clearly a simple commmand with two arguments and 4 redirections
cmd arg1 arg2 <myFile 3<&1 2>/dev/null >&2
# Good!
{ cmd1 <<<'my input'; cmd2; } >someFile
# Bad. Is the "1" a file descriptor or an argument to cmd? (answer: it's the FD). Is the space after the herestring part of the input data? (answer: No).
# The redirects are also not delimited in any obvious way.
cmd 2>& 1 <<< stuff
# Hideously Bad. It's difficult to tell where the redirects are and whether they're even valid redirects.
# This is in fact one command with one argument, an assignment, and three redirects.
foo=bar<baz bork<<< blarg>bleh
# Conclusion
I hope this tutorial worked for you.
I lied, I did not explain `1>&3-`, go check the manual ;-)
Thanks to Stéphane Chazelas from whom I stole both the intro and the
example\....
The intro is inspired by this introduction, you\'ll find a nice exercise
there too:
- [A Detailed Introduction to I/O and I/O
Redirection](http://tldp.org/LDP/abs/html/ioredirintro.html)
The last example comes from this post:
- [comp.unix.shell: piping stdout and stderr to different
processes](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9)
# See also
- Internal: [Redirection syntax overview](/syntax/redirection)

View File

@ -0,0 +1,94 @@
The one of the simplest way to check your bash/sh scripts is run it and
check it output or run it and check the result. This tutorial shows
how-to use [bashtest](https://github.com/pahaz/bashtest) tool for
testing your scripts.
### Write simple util
We have a simple **stat.sh** script:
#!/usr/bin/env bash
if [ -z "$1" ]
then
DIR=./
else
DIR=$1
fi
echo "Evaluate *.py statistics"
FILES=$(find $DIR -name '*.py' | wc -l)
LINES=$((find $DIR -name '*.py' -print0 | xargs -0 cat) | wc -l)
echo "PYTHON FILES: $FILES"
echo "PYTHON LINES: $LINES"
This script evaluate the number of python files and the number of python
code lines in the files. We can use it like **./stat.sh \<dir\>**
### Create testsuit
Then make test suits for **stat.sh**. We make a directory **testsuit**
which contain test python files.
**testsuit/main.py**
import foo
print(foo)
**testsuit/foo.py**
BAR = 1
BUZ = BAR + 2
Ok! Our test suit is ready! We have 2 python files which contains 4
lines of code.
### Write bashtests
Lets write tests. We just write a shell command for testing our work.
Create file **tests.bashtest**:
$ ./stat.sh testsuit/
Evaluate *.py statistics
PYTHON FILES: 2
PYTHON LINES: 4
This is our test! This is simple. Try to run it.
# install bashtest if required!
$ pip install bashtest
# run tests
$ bashtest *.bashtest
1 items passed all tests:
1 tests in tests.bashtest
1 tests in 1 items.
1 passed and 0 failed.
Test passed.
Thats all. We wrote one test. You can write more tests if you want.
$ ls testsuit/
foo.py main.py
$ ./stat.sh testsuit/
Evaluate *.py statistics
PYTHON FILES: 2
PYTHON LINES: 4
And run tests again:
$ bashtest *.bashtest
1 items passed all tests:
2 tests in tests.bashtest
2 tests in 1 items.
2 passed and 0 failed.
Test passed.
You can find more **.bashtest** examples in the [bashtest github
repo](https://github.com/pahaz/bashtest). You can also write your
question or report a bug
[here](https://github.com/pahaz/bashtest/issues).
Happy testing!