mirror of
https://github.com/flokoe/bash-hackers-wiki.git
synced 2024-11-01 06:53:05 +01:00
Convert howto pages to Markdown
This commit is contained in:
parent
024e1bcc0e
commit
2d52763c75
281
docs/howto/calculate-dc.md
Normal file
281
docs/howto/calculate-dc.md
Normal file
@ -0,0 +1,281 @@
|
||||
# Calculating with dc
|
||||
|
||||
![](keywords>bash shell scripting arithmetic calculate)
|
||||
|
||||
## Introduction
|
||||
|
||||
dc(1) is a non standard, but commonly found, reverse-polish Desk
|
||||
Calculator. According to Ken Thompson, \"dc is the oldest language on
|
||||
Unix; it was written on the PDP-7 and ported to the PDP-11 before Unix
|
||||
\[itself\] was ported\".
|
||||
|
||||
Historically the standard bc(1) has been implemented as a *front-end to
|
||||
dc*.
|
||||
|
||||
## Simple calculation
|
||||
|
||||
In brief, the *reverse polish notation* means the numbers are put on the
|
||||
stack first, then an operation is applied to them. Instead of writing
|
||||
`1+1`, you write `1 1+`.
|
||||
|
||||
By default `dc`, unlike `bc`, doesn\'t print anything, the result is
|
||||
pushed on the stack. You have to use the \"p\" command to print the
|
||||
element at the top of the stack. Thus a simple operation looks like:
|
||||
|
||||
$ dc <<< '1 1+pq'
|
||||
2
|
||||
|
||||
I used a \"here string\" present in bash 3.x, ksh93 and zsh. if your
|
||||
shell doesn\'t support this, you can use `echo '1 1+p' | dc` or if you
|
||||
have GNU `dc`, you can use `dc -e '1 1 +p`\'.
|
||||
|
||||
Of course, you can also just run `dc` and enter the commands.
|
||||
|
||||
The classic operations are:
|
||||
|
||||
- addition: `+`
|
||||
- subtraction: `-`
|
||||
- division: `/`
|
||||
- multiplication: `*`
|
||||
- remainder (modulo): `%`
|
||||
- exponentiation: `^`
|
||||
- square root: `v`
|
||||
|
||||
GNU `dc` adds a couple more.
|
||||
|
||||
To input a negative number you need to use the `_` (underscore)
|
||||
character:
|
||||
|
||||
$ dc <<< '1_1-p'
|
||||
2
|
||||
|
||||
You can use the *digits* `0` to `9` and the *letters* `A` to `F` as
|
||||
numbers, and a dot (`.`) as a decimal point. The `A` to `F` **must** be
|
||||
capital letters in order not to be confused with the commands specified
|
||||
with lower case characters. A number with a letter is considered
|
||||
hexadecimal:
|
||||
|
||||
dc <<< 'Ap'
|
||||
10
|
||||
|
||||
The **output** is converted to **base 10** by default
|
||||
|
||||
## Scale And Base
|
||||
|
||||
`dc` is a calulator with abitrary precision, by default this precision
|
||||
is 0. thus `dc <<< "5 4/p"` prints \"1\".
|
||||
|
||||
We can increase the precision using the `k` command. It pops the value
|
||||
at the top of the stack and uses it as the precision argument:
|
||||
|
||||
dc <<< '2k5 4/p' # prints 1.25
|
||||
dc <<< '4k5 4/p' # prints 1.2500
|
||||
dc <<< '100k 2vp'
|
||||
1.4142135623730950488016887242096980785696718753769480731766797379907\
|
||||
324784621070388503875343276415727
|
||||
|
||||
dc supports *large* precision arguments.
|
||||
|
||||
You can change the base used to output (*print*) the numbers with `o`
|
||||
and the base used to input (*type*) the numbers with `i`:
|
||||
|
||||
dc << EOF
|
||||
20 p# prints 20, output is in base 10
|
||||
16o # the output is now in base 2 16
|
||||
20p # prints 14, in hex
|
||||
16i # the output is now in hex
|
||||
p # prints 14 this doesn't modify the number in the stack
|
||||
10p # prints 10 the output is done in base 16
|
||||
EOF
|
||||
|
||||
Note: when the input value is modified, the base is modified for all
|
||||
commands, including `i`:
|
||||
|
||||
dc << EOF
|
||||
16i 16o # base is 16 for input and output
|
||||
10p # prints 10
|
||||
10i # ! set the base to 10 i.e. to 16 decimal
|
||||
17p # prints 17
|
||||
EOF
|
||||
|
||||
This code prints 17 while we might think that `10i` reverts the base
|
||||
back to 10 and thus the number should be converted to hex and printed as
|
||||
11. The problem is 10 was typed while the input base 16, thus the base
|
||||
was set to 10 hexadecimal, i.e. 16 decimal.
|
||||
|
||||
dc << EOF
|
||||
16o16o10p #prints 10
|
||||
Ai # set the base to A in hex i.e. 10
|
||||
17p # prints 11 in base 16
|
||||
EOF
|
||||
|
||||
## Stack
|
||||
|
||||
There are two basic commands to manipulate the stack:
|
||||
|
||||
- `d` duplicates the top of the stack
|
||||
- `c` clears the stack
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
$ dc << EOF
|
||||
2 # put 2 on the stack
|
||||
d # duplicate i.e. put another 2 on the stack
|
||||
*p # multiply and print
|
||||
c p # clear and print
|
||||
EOF
|
||||
4
|
||||
dc: stack empty
|
||||
|
||||
`c p` results in an error, as we would expect, as c removes everything
|
||||
on the stack. *Note: we can use `#` to put comments in the script.*
|
||||
|
||||
If you are lost, you can inspect (i.e. print) the stack using the
|
||||
command `f`. The stack remains unchanged:
|
||||
|
||||
dc <<< '1 2 d 4+f'
|
||||
6
|
||||
2
|
||||
1
|
||||
|
||||
Note how the first element that will be popped from the stack is printed
|
||||
first, if you are used to an HP calculator, it\'s the reverse.
|
||||
|
||||
Don\'t hesitate to put `f` in the examples of this tutorial, it doesn\'t
|
||||
change the result, and it\'s a good way to see what\'s going on.
|
||||
|
||||
## Registers
|
||||
|
||||
The GNU `dc` manual says that dc has at least **256 registers**
|
||||
depending on the range of unsigned char. I\'m not sure how you are
|
||||
supposed to use the NUL byte. Using a register is easy:
|
||||
|
||||
dc <<EOF
|
||||
12 # put 12 on the stack
|
||||
sa # remove it from the stack (s), and put it in register 'a'
|
||||
10 # put 10 on the stack
|
||||
la # read (l) the value of register 'a' and push it on the stack
|
||||
+p # add the 2 values and print
|
||||
EOF
|
||||
|
||||
The above snippet uses newlines to embed comments, but it doesn\'t
|
||||
really matter, you can use `echo '12sa10la+p'| dc`, with the same
|
||||
results.
|
||||
|
||||
The register can contain more than just a value, **each register is a
|
||||
stack on its own**.
|
||||
|
||||
dc <<EOF
|
||||
12sa #store 12 in 'a'
|
||||
6Sa # with a capital S the 6 is removed
|
||||
# from the main stack and pushed on the 'a' stack
|
||||
lap # prints 6, the value at the top of the 'a' stack
|
||||
lap # still prints 6
|
||||
Lap # prints 6 also but with a capital L, it pushes the value in 'a'
|
||||
# to the main stack and pulls it from the 'a' stack
|
||||
lap # prints 12, which is now at the top of the stack
|
||||
EOF
|
||||
|
||||
## Macros
|
||||
|
||||
`dc` lets you push arbitrary strings on the stack when the strings are
|
||||
enclosed in `[]`. You can print it with `p`: `dc <<< '[Hello World!]p'`
|
||||
and you can evalute it with x: `dc <<< '[1 2+]xp'`.
|
||||
|
||||
This is not that interesting until combined with registers. First,
|
||||
let\'s say we want to calculate the square of a number (don\'t forget to
|
||||
include `f` if you get lost!):
|
||||
|
||||
dc << EOF
|
||||
3 # push our number on the stack
|
||||
d # duplicate it i.e. push 3 on the stack again
|
||||
d**p # duplicate again and calculate the product and print
|
||||
EOF
|
||||
|
||||
Now we have several cubes to calculate, we could use `dd**` several
|
||||
times, or use a macro.
|
||||
|
||||
dc << EOF
|
||||
[dd**] # push a string
|
||||
sa # save it in register a
|
||||
3 # push 3 on the stack
|
||||
lax # push the string "dd**" on the stack and execute it
|
||||
p # print the result
|
||||
4laxp # same operation for 4, in one line
|
||||
EOF
|
||||
|
||||
## Conditionals and Loops
|
||||
|
||||
`dc` can execute a macro stored in a register using the `lR x` combo,
|
||||
but it can also execute macros conditionally. `>a` will execute the
|
||||
macro stored in the register `a`, if the top of the stack is *greater
|
||||
than* the second element of the stack. Note: the top of the stack
|
||||
contains the last entry. When written, it appears as the reverse of what
|
||||
we are used to reading:
|
||||
|
||||
dc << EOF
|
||||
[[Hello World]p] sR # store in 'R' a macro that prints Hello World
|
||||
2 1 >R # do nothing 1 is at the top 2 is the second element
|
||||
1 2 >R # prints Hello World
|
||||
EOF
|
||||
|
||||
Some `dc` have `>R <R =R`, GNU `dc` had some more, check your manual.
|
||||
Note that the test \"consumes\" its operands: the 2 first elements are
|
||||
popped off the stack (you can verify that
|
||||
`dc <<< "[f]sR 2 1 >R 1 2 >R f"` doesn\'t print anything)
|
||||
|
||||
Have you noticed how we can *include* a macro (string) in a macro? and
|
||||
as `dc` relies on a stack we can, in fact, use the macro recursively
|
||||
(have your favorite control-c key combo ready ;)) :
|
||||
|
||||
dc << EOF
|
||||
[ [Hello World] p # our macro starts by printing Hello World
|
||||
lRx ] # and then executes the macro in R
|
||||
sR # we store it in the register R
|
||||
lRx # and finally executes it.
|
||||
EOF
|
||||
|
||||
We have recursivity, we have test, we have loops:
|
||||
|
||||
dc << EOF
|
||||
[ li # put our index i on the stack
|
||||
p # print it, to see what's going on
|
||||
1 - # we decrement the index by one
|
||||
si # store decremented index (i=i-1)
|
||||
0 li >L # if i > 0 then execute L
|
||||
] sL # store our macro with the name L
|
||||
|
||||
10 si # let's give to our index the value 10
|
||||
lLx # and start our loop
|
||||
EOF
|
||||
|
||||
Of course code written this way is far too easy to read! Make sure to
|
||||
remove all those extra spaces newlines and comments:
|
||||
|
||||
dc <<< '[lip1-si0li>L]sL10silLx'
|
||||
dc <<< '[p1-d0<L]sL10lLx' # use the stack instead of a register
|
||||
|
||||
I\'ll let you figure out the second example, it\'s not hard, it uses the
|
||||
stack instead of a register for the index.
|
||||
|
||||
## Next
|
||||
|
||||
Check your dc manual, i haven\'t decribed everything, like arrays (only
|
||||
documented with \"; : are used by bc(1) for array operations\" on
|
||||
solaris, probably because *echo \'1 0:a 0Sa 2 0:a La 0;ap\' \| dc*
|
||||
results in //Segmentation Fault (core dump) //, the latest solaris uses
|
||||
GNU dc)
|
||||
|
||||
You can find more info and dc programs here:
|
||||
|
||||
- <http://en.wikipedia.org/wiki/Dc_(Unix)>
|
||||
|
||||
And more example, as well as a dc implementation in python here:
|
||||
|
||||
- <http://en.literateprograms.org/Category:Programming_language:dc>
|
||||
- <http://en.literateprograms.org/Desk_calculator_%28Python%29>
|
||||
|
||||
The manual for the 1971 dc from Bell Labs:
|
||||
|
||||
- <http://cm.bell-labs.com/cm/cs/who/dmr/man12.ps> (dead link)
|
98
docs/howto/collapsing_functions.md
Normal file
98
docs/howto/collapsing_functions.md
Normal file
@ -0,0 +1,98 @@
|
||||
# Collapsing Functions
|
||||
|
||||
![](keywords>bash shell scripting example function collapse)
|
||||
|
||||
## What is a \"Collapsing Function\"?
|
||||
|
||||
A collapsing function is a function whose behavior changes depending
|
||||
upon the circumstances under which it\'s run. Function collapsing is
|
||||
useful when you find yourself repeatedly checking a variable whose value
|
||||
never changes.
|
||||
|
||||
## How do I make a function collapse?
|
||||
|
||||
Function collapsing requires some static feature in the environment. A
|
||||
common example is a script that gives the user the option of having
|
||||
\"verbose\" output.
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
[[ $1 = -v || $1 = --verbose ]] && verbose=1
|
||||
|
||||
chatter() {
|
||||
if [[ $verbose ]]; then
|
||||
chatter() {
|
||||
echo "$@"
|
||||
}
|
||||
chatter "$@"
|
||||
else
|
||||
chatter() {
|
||||
:
|
||||
}
|
||||
fi
|
||||
}
|
||||
|
||||
echo "Waiting for 10 seconds."
|
||||
for i in {1..10}; do
|
||||
chatter "$i"
|
||||
sleep 1
|
||||
done
|
||||
|
||||
## How does it work?
|
||||
|
||||
The first time you run chatter(), the function redefines itself based on
|
||||
the value of verbose. Thereafter, chatter doesn\'t check \$verbose, it
|
||||
simply is. Further calls to the function reflect its collapsed nature.
|
||||
If verbose is unset, chatter will echo nothing, with no extra effort
|
||||
from the developer.
|
||||
|
||||
## More examples
|
||||
|
||||
FIXME Add more examples!
|
||||
|
||||
# Somewhat more portable find -executable
|
||||
# FIXME/UNTESTED (I don't have access to all of the different versions of find.)
|
||||
# Usage: find PATH ARGS -- use find like normal, except use -executable instead of
|
||||
# various versions of -perm /+ blah blah and hacks
|
||||
find() {
|
||||
hash find || { echo 'find not found!'; exit 1; }
|
||||
# We can be pretty sure "$0" should be executable.
|
||||
if [[ $(command find "$0" -executable 2> /dev/null) ]]; then
|
||||
unset -f find # We can just use the command find
|
||||
elif [[ $(command find "$0" -perm /u+x 2> /dev/null) ]]; then
|
||||
find() {
|
||||
typeset arg args
|
||||
for arg do
|
||||
[[ $arg = -executable ]] && args+=(-perm /u+x) || args+=("$arg")
|
||||
done
|
||||
command find "${args[@]}"
|
||||
}
|
||||
elif [[ $(command find "$0" -perm +u+x 2> /dev/null) ]]; then
|
||||
find() {
|
||||
typeset arg args
|
||||
for arg do
|
||||
[[ $arg = -executable ]] && args+=(-perm +u+x) || args+=("$arg")
|
||||
done
|
||||
command find "${args[@]}"
|
||||
}
|
||||
else # Last resort
|
||||
find() {
|
||||
typeset arg args
|
||||
for arg do
|
||||
[[ $arg = -executable ]] && args+=(-exec test -x {} \; -print) || args+=("$arg")
|
||||
done
|
||||
command find "${args[@]}"
|
||||
}
|
||||
fi
|
||||
find "$@"
|
||||
}
|
||||
|
||||
#!/bin/bash
|
||||
# Using collapsing functions to turn debug messages on/off
|
||||
|
||||
[ "--debug" = "$1" ] && dbg=echo || dbg=:
|
||||
|
||||
|
||||
# From now on if you use $dbg instead of echo, you can select if messages will be shown
|
||||
|
||||
$dbg "This message will only be displayed if --debug is specified at the command line
|
109
docs/howto/conffile.md
Normal file
109
docs/howto/conffile.md
Normal file
@ -0,0 +1,109 @@
|
||||
# Config files for your script
|
||||
|
||||
![](keywords>bash shell scripting config files include configuration)
|
||||
|
||||
## General
|
||||
|
||||
For this task, you don\'t have to write large parser routines (unless
|
||||
you want it 100% secure or you want a special file syntax) - you can use
|
||||
the Bash source command. The file to be sourced should be formated in
|
||||
key=\"value\" format, otherwise bash will try to interpret commands:
|
||||
|
||||
#!/bin/bash
|
||||
echo "Reading config...." >&2
|
||||
source /etc/cool.cfg
|
||||
echo "Config for the username: $cool_username" >&2
|
||||
echo "Config for the target host: $cool_host" >&2
|
||||
|
||||
So, where do these variables come from? If everything works fine, they
|
||||
are defined in /etc/cool.cfg which is a file that\'s sourced into the
|
||||
current script or shell. Note: this is **not** the same as executing
|
||||
this file as a script! The sourced file most likely contains something
|
||||
like:
|
||||
|
||||
cool_username="guest"
|
||||
cool_host="foo.example.com"
|
||||
|
||||
These are normal statements understood by Bash, nothing special. Of
|
||||
course (and, a big disadvantage under normal circumstances) the sourced
|
||||
file can contain **everything** that Bash understands, including
|
||||
malicious code!
|
||||
|
||||
The `source` command also is available under the name `.` (dot). The
|
||||
usage of the dot is identical:
|
||||
|
||||
#!/bin/bash
|
||||
echo "Reading config...." >&2
|
||||
. /etc/cool.cfg #note the space between the dot and the leading slash of /etc.cfg
|
||||
echo "Config for the username: $cool_username" >&2
|
||||
echo "Config for the target host: $cool_host" >&2
|
||||
|
||||
## Per-user configs
|
||||
|
||||
There\'s also a way to provide a system-wide config file in /etc and a
|
||||
custom config in \~/(user\'s home) to override system-wide defaults. In
|
||||
the following example, the if/then construct is used to check for the
|
||||
existance of a user-specific config:
|
||||
|
||||
#!/bin/bash
|
||||
echo "Reading system-wide config...." >&2
|
||||
. /etc/cool.cfg
|
||||
if [ -r ~/.coolrc ]; then
|
||||
echo "Reading user config...." >&2
|
||||
. ~/.coolrc
|
||||
fi
|
||||
|
||||
## Secure it
|
||||
|
||||
As mentioned earlier, the sourced file can contain anything a Bash
|
||||
script can. Essentially, it **is** an included Bash script. That creates
|
||||
security issues. A malicicios person can \"execute\" arbitrary code when
|
||||
your script is sourcing its config file. You might want to allow only
|
||||
constructs in the form `NAME=VALUE` in that file (variable assignment
|
||||
syntax) and maybe comments (though technically, comments are
|
||||
unimportant). Imagine the following \"config file\", containing some
|
||||
malicious code:
|
||||
|
||||
# cool config file for my even cooler script
|
||||
username=god_only_knows
|
||||
hostname=www.example.com
|
||||
password=secret ; echo rm -rf ~/*
|
||||
parameter=foobar && echo "You've bene pwned!";
|
||||
# hey look, weird code follows...
|
||||
echo "I am the skull virus..."
|
||||
echo rm -fr ~/*
|
||||
mailto=netadmin@example.com
|
||||
|
||||
You don\'t want these `echo`-commands (which could be any other
|
||||
commands!) to be executed. One way to be a bit safer is to filter only
|
||||
the constructs you want, write the filtered results to a new file and
|
||||
source the new file. We also need to be sure something nefarious hasn\'t
|
||||
been added to the end of one of our name=value parameters, perhaps using
|
||||
; or && command separators. In those cases, perhaps it is simplest to
|
||||
just ignore the line entirely. Egrep (`grep -E`) will help us here, it
|
||||
filters by description:
|
||||
|
||||
#!/bin/bash
|
||||
configfile='/etc/cool.cfg'
|
||||
configfile_secured='/tmp/cool.cfg'
|
||||
|
||||
# check if the file contains something we don't want
|
||||
if egrep -q -v '^#|^[^ ]*=[^;]*' "$configfile"; then
|
||||
echo "Config file is unclean, cleaning it..." >&2
|
||||
# filter the original to a new file
|
||||
egrep '^#|^[^ ]*=[^;&]*' "$configfile" > "$configfile_secured"
|
||||
configfile="$configfile_secured"
|
||||
fi
|
||||
|
||||
# now source it, either the original or the filtered variant
|
||||
source "$configfile"
|
||||
|
||||
**[To make clear what it does:]{.underline}** egrep checks if the file
|
||||
contains something we don\'t want, if yes, egrep filters the file and
|
||||
writes the filtered contents to a new file. If done, the original file
|
||||
name is changed to the name stored in the variable `configfile`. The
|
||||
file named by that variable is sourced, as if it were the original file.
|
||||
|
||||
This filter allows only `NAME=VALUE` and comments in the file, but it
|
||||
doesn\'t prevent all methods of code execution. I will address that
|
||||
later.
|
125
docs/howto/dissectabadoneliner.md
Normal file
125
docs/howto/dissectabadoneliner.md
Normal file
@ -0,0 +1,125 @@
|
||||
# Dissect a bad oneliner
|
||||
|
||||
``` bash
|
||||
$ ls *.zip | while read i; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done
|
||||
```
|
||||
|
||||
This is an actual one-liner someone asked about in `#bash`. **There are
|
||||
several things wrong with it. Let\'s break it down!**
|
||||
|
||||
``` bash
|
||||
$ ls *.zip | while read i; do ...; done
|
||||
```
|
||||
|
||||
(Please read <http://mywiki.wooledge.org/ParsingLs>.) This command
|
||||
executes `ls` on the expansion of `*.zip`. Assuming there are filenames
|
||||
in the current directory that end in \'.zip\', ls will give a
|
||||
human-readable list of those names. The output of ls is not for parsing.
|
||||
But in sh and bash alike, we can loop safely over the glob itself:
|
||||
|
||||
``` bash
|
||||
$ for i in *.zip; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done
|
||||
```
|
||||
|
||||
Let\'s break it down some more!
|
||||
|
||||
``` bash
|
||||
j=`echo $i | sed 's/.zip//g'` # where $i is some name ending in '.zip'
|
||||
```
|
||||
|
||||
The goal here seems to be get the filename without its `.zip` extension.
|
||||
In fact, there is a POSIX(r)-compliant command to do this: `basename`
|
||||
The implementation here is suboptimal in several ways, but the only
|
||||
thing that\'s genuinely error-prone with this is \"`echo $i`\". Echoing
|
||||
an *unquoted* variable means
|
||||
[wordsplitting](/syntax/expansion/wordsplit) will take place, so any
|
||||
whitespace in `$i` will essentially be normalized. In `sh` it is
|
||||
necessary to use an external command and a subshell to achieve the goal,
|
||||
but we can eliminate the pipe (subshells, external commands, and pipes
|
||||
carry extra overhead when they launch, so they can really hurt
|
||||
performance in a loop). Just for good measure, let\'s use the more
|
||||
readable, [modern](/syntax/expansion/cmdsubst) `$()` construct instead
|
||||
of the old style backticks:
|
||||
|
||||
``` bash
|
||||
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir $j; cd $j; unzip ../$i; cd ..; done
|
||||
```
|
||||
|
||||
In Bash we don\'t need the subshell or the external basename command.
|
||||
See [Substring removal with parameter
|
||||
expansion](/syntax/pe#substring_removal):
|
||||
|
||||
``` bash
|
||||
bash $ for i in *.zip; do j="${i%.zip}"; mkdir $j; cd $j; unzip ../$i; cd ..; done
|
||||
```
|
||||
|
||||
Let\'s keep going:
|
||||
|
||||
``` bash
|
||||
$ mkdir $j; cd $j; ...; cd ..
|
||||
```
|
||||
|
||||
As a programmer, you **never** know the situation under which your
|
||||
program will run. Even if you do, the following best practice will never
|
||||
hurt: When a following command depends on the success of a previous
|
||||
command(s), check for success! You can do this with the \"`&&`\"
|
||||
conjunction, that way, if the previous command fails, bash will not try
|
||||
to execute the following command(s). It\'s fully POSIX(r). Oh, and
|
||||
remember what I said about [wordsplitting](/syntax/expansion/wordsplit)
|
||||
in the previous step? Well, if you don\'t quote `$j`, wordsplitting can
|
||||
happen again.
|
||||
|
||||
``` bash
|
||||
$ mkdir "$j" && cd "$j" && ... && cd ..
|
||||
```
|
||||
|
||||
That\'s almost right, but there\'s one problem \-- what happens if `$j`
|
||||
contains a slash? Then `cd ..` will not return to the original
|
||||
directory. That\'s wrong! `cd -` causes cd to return to the previous
|
||||
working directory, so it\'s a much better choice:
|
||||
|
||||
``` bash
|
||||
$ mkdir "$j" && cd "$j" && ... && cd -
|
||||
```
|
||||
|
||||
(If it occurred to you that I forgot to check for success after cd -,
|
||||
good job! You could do this with `{ cd - || break; }`, but I\'m going to
|
||||
leave that out because it\'s verbose and I think it\'s likely that we
|
||||
will be able to get back to our original working directory without a
|
||||
problem.)
|
||||
|
||||
So now we have:
|
||||
|
||||
``` bash
|
||||
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && cd "$j" && unzip ../$i && cd -; done
|
||||
```
|
||||
|
||||
``` bash
|
||||
bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && cd "$j" && unzip ../$i && cd -; done
|
||||
```
|
||||
|
||||
Let\'s throw the `unzip` command back in the mix:
|
||||
|
||||
``` bash
|
||||
mkdir "$j" && cd "$j" && unzip ../$i && cd -
|
||||
```
|
||||
|
||||
Well, besides word splitting, there\'s nothing terribly wrong with this.
|
||||
Still, did it occur to you that unzip might already be able to target a
|
||||
directory? There isn\'t a standard for the `unzip` command, but all the
|
||||
implementations I\'ve seen can do it with the -d flag. So we can drop
|
||||
the cd commands entirely:
|
||||
|
||||
``` bash
|
||||
$ mkdir "$j" && unzip -d "$j" "$i"
|
||||
```
|
||||
|
||||
``` bash
|
||||
sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && unzip -d "$j" "$i"; done
|
||||
```
|
||||
|
||||
``` bash
|
||||
bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && unzip -d "$j" "$i"; done
|
||||
```
|
||||
|
||||
There! That\'s as good as it gets.
|
389
docs/howto/edit-ed.md
Normal file
389
docs/howto/edit-ed.md
Normal file
@ -0,0 +1,389 @@
|
||||
# Editing files via scripts with ed
|
||||
|
||||
![](keywords>bash shell scripting arguments file editor edit ed sed)
|
||||
|
||||
## Why ed?
|
||||
|
||||
Like `sed`, `ed` is a line editor. However, if you try to change file
|
||||
contents with `sed`, and the file is open elsewhere and read by some
|
||||
process, you will find out that GNU `sed` and its `-i` option will not
|
||||
allow you to edit the file. There are circumstances where you may need
|
||||
that, e.g. editing active and open files, the lack of GNU, or other
|
||||
`sed`, with \"in-place\" option available.
|
||||
|
||||
Why `ed`?
|
||||
|
||||
- maybe your `sed` doesn\'t support in-place edit
|
||||
- maybe you need to be as portable as possible
|
||||
- maybe you need to really edit in-file (and not create a new file
|
||||
like GNU `sed`)
|
||||
- last but not least: standard `ed` has very good editing and
|
||||
addressing possibilities, compared to standard `sed`
|
||||
|
||||
Don\'t get me wrong, this is **not** meant as anti-`sed` article! It\'s
|
||||
just meant to show you another way to do the job.
|
||||
|
||||
## Commanding ed
|
||||
|
||||
Since `ed` is an interactive text editor, it reads and executes commands
|
||||
that come from `stdin`. There are several ways to feed our commands to
|
||||
ed:
|
||||
|
||||
**[Pipelines]{.underline}**
|
||||
|
||||
echo '<ED-COMMANDS>' | ed <FILE>
|
||||
|
||||
To inject the needed newlines, etc. it may be easier to use the builtin
|
||||
command, `printf` (\"help printf\"). Shown here as an example Bash
|
||||
function to prefix text to file content:
|
||||
|
||||
|
||||
# insertHead "$text" "$file"
|
||||
|
||||
insertHead() {
|
||||
printf '%s\n' H 1i "$1" . w | ed -s "$2"
|
||||
}
|
||||
|
||||
**[Here-strings]{.underline}**
|
||||
|
||||
ed <FILE> <<< '<ED-COMMANDS>'
|
||||
|
||||
**[Here-documents]{.underline}**
|
||||
|
||||
ed <FILE> <<EOF
|
||||
<ED-COMMANDS>
|
||||
EOF
|
||||
|
||||
Which one you prefer is your choice. I will use the here-strings, since
|
||||
it looks best here IMHO.
|
||||
|
||||
There are other ways to provide input to `ed`. For example, process
|
||||
substitution. But these should be enough for daily needs.
|
||||
|
||||
Since `ed` wants commands separated by newlines, I\'ll use a special
|
||||
Bash quoting method, the C-like strings `$'TEXT'`, as it can interpret a
|
||||
set of various escape sequences and special characters. I\'ll use the
|
||||
`-s` option to make it less verbose.
|
||||
|
||||
## The basic interface
|
||||
|
||||
Check the `ed` manpage for details
|
||||
|
||||
Similar to `vi` or `vim`, `ed` has a \"command mode\" and an
|
||||
\"interactive mode\". For non-interactive use, the command mode is the
|
||||
usual choice.
|
||||
|
||||
Commands to `ed` have a simple and regular structure: zero, one, or two
|
||||
addresses followed by a single-character command, possibly followed by
|
||||
parameters to that command. These addresses specify one or more lines in
|
||||
the text buffer. Every command that requires addresses has default
|
||||
addresses, so the addresses can often be omitted.
|
||||
|
||||
The line addressing is relative to the *current line*. If the edit
|
||||
buffer is not empty, the initial value for the *current line* shall be
|
||||
the last line in the edit buffer, otherwise zero. Generally, the
|
||||
*current line* is the last line affected by a command. All addresses can
|
||||
only address single lines, not blocks of lines!
|
||||
|
||||
Line addresses or commands using *regular expressions* interpret POSIX
|
||||
Basic Regular Expressions (BRE). A null BRE is used to reference the
|
||||
most recently used BRE. Since `ed` addressing is only for single lines,
|
||||
no RE can ever match a newline.
|
||||
|
||||
## Debugging your ed scripts
|
||||
|
||||
By default, `ed` is not very talkative and will simply print a \"?\"
|
||||
when an error occurs. Interactively you can use the `h` command to get a
|
||||
short message explaining the last error. You can also turn on a mode
|
||||
that makes `ed` automatically print this message with the `H` command.
|
||||
It is a good idea to always add this command at the beginning of your ed
|
||||
scripts:
|
||||
|
||||
bash > ed -s file <<< $'H\n,df'
|
||||
?
|
||||
script, line 2: Invalid command suffix
|
||||
|
||||
While working on your script, you might make errors and destroy your
|
||||
file, you might be tempted to try your script doing something like:
|
||||
|
||||
# Works, but there is better
|
||||
|
||||
# copy my original file
|
||||
cp file file.test
|
||||
|
||||
# try my script on the file
|
||||
ed -s file.test <<< $'H\n<ed commands>\nw'
|
||||
|
||||
# see the results
|
||||
cat file.test
|
||||
|
||||
There is a much better way though, you can use the ed command `p` to
|
||||
print the file, now your testing would look like:
|
||||
|
||||
ed -s file <<< $'H\n<ed commands>\n,p'
|
||||
|
||||
the `,` (comma) in front of the `p` command is a shortcut for `1,$`
|
||||
which defines an address range for the first to the last line, `,p` thus
|
||||
means print the whole file, after it has been modified. When your script
|
||||
runs sucessfully, you only have to replace the `,p` by a `w`.
|
||||
|
||||
Of course, even if the file is not modified by the `p` command, **it\'s
|
||||
always a good idea to have a backup copy!**
|
||||
|
||||
## Editing your files
|
||||
|
||||
Most of these things can be done with `sed`. But there are also things
|
||||
that can\'t be done in `sed` or can only be done with very complex code.
|
||||
|
||||
### Simple word substitutions
|
||||
|
||||
Like `sed`, `ed` also knows the common `s/FROM/TO/` command, and it can
|
||||
also take line-addresses. **If no substitution is made on the addressed
|
||||
lines, it\'s considered an error.**
|
||||
|
||||
#### Substitutions through the whole file
|
||||
|
||||
ed -s test.txt <<< $',s/Windows(R)-compatible/POSIX-conform/g\nw'
|
||||
|
||||
[Note:]{.underline} The comma as single address operator is an alias for
|
||||
`1,$` (\"all lines\").
|
||||
|
||||
#### Substitutions in specific lines
|
||||
|
||||
On a line containing `fruits`, do the substitution:
|
||||
|
||||
ed -s test.txt <<< $'/fruits/s/apple/banana/g\nw'
|
||||
|
||||
On the 5th line after the line containing `fruits`, do the substitution:
|
||||
|
||||
ed -s test.txt <<< $'/fruits/+5s/apple/banana/g\nw'
|
||||
|
||||
### Block operations
|
||||
|
||||
#### Delete a block of text
|
||||
|
||||
The simple one is a well-known (by position) block of text:
|
||||
|
||||
# delete lines number 2 to 4 (2, 3, 4)
|
||||
ed -s test.txt <<< $'2,5d\nw'
|
||||
|
||||
This deletes all lines matching a specific regular expression:
|
||||
|
||||
# delete all lines matching foobar
|
||||
ed -s test.txt <<< $'g/foobar/d\nw'
|
||||
|
||||
g/regexp/ applies the command following it to all the lines matching the
|
||||
regexp
|
||||
|
||||
#### Move a block of text
|
||||
|
||||
\...using the `m` command: `<ADDRESS> m <TARGET-ADDRESS>`
|
||||
|
||||
This is definitely something that can\'t be done easily with sed.
|
||||
|
||||
# moving lines 5-9 to the end of the file
|
||||
ed -s test.txt <<< $'5,9m$\nw'
|
||||
|
||||
# moving lines 5-9 to line 3
|
||||
ed -s test.txt <<< $'5,9m3\nw'
|
||||
|
||||
#### Copy a block of text
|
||||
|
||||
\...using the `t` command: `<ADDRESS> t <TARGET-ADDRESS>`
|
||||
|
||||
You use the `t` command just like you use the `m` (move) command.
|
||||
|
||||
# make a copy of lines 5-9 and place it at the end of the file
|
||||
ed -s test.txt <<< $'5,9t$\nw'
|
||||
|
||||
# make a copy of lines 5-9 and place it at line 3
|
||||
ed -s test.txt <<< $'5,9t3\nw'
|
||||
|
||||
#### Join all lines
|
||||
|
||||
\...but leave the final newline intact. This is done by an extra
|
||||
command: `j` (join).
|
||||
|
||||
ed -s file <<< $'1,$j\nw'
|
||||
|
||||
Compared with two other methods (using `tr` or `sed`), you don\'t have
|
||||
to delete all newlines and manually add one at the end.
|
||||
|
||||
### File operations
|
||||
|
||||
#### Insert another file
|
||||
|
||||
How do you insert another file? As with `sed`, you use the `r` (read)
|
||||
command. That inserts another file at the line before the last line (and
|
||||
prints the result to stdout - `,p`):
|
||||
|
||||
ed -s FILE1 <<< $'$-1 r FILE2\n,p'
|
||||
|
||||
To compare, here\'s a possible `sed` solution which must use Bash
|
||||
arithmetic and the external program `wc`:
|
||||
|
||||
sed "$(($(wc -l < FILE1)-1))r FILE2" FILE1
|
||||
|
||||
# UPDATE here's one which uses GNU sed's "e" parameter for the s-command
|
||||
# it executes the commands found in pattern space. I'll take that as a
|
||||
# security risk, but well, sometimes GNU > security, you know...
|
||||
sed '${h;s/.*/cat FILE2/e;G}' FILE1
|
||||
|
||||
Another approach, in two invocations of sed, that avoids the use of
|
||||
external commands completely:
|
||||
|
||||
sed $'${s/$/\\n-||-/;r FILE2\n}' FILE1 | sed '0,/-||-/{//!h;N;//D};$G'
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### ed is not sed
|
||||
|
||||
ed and sed might look similar, but the same command(s) might act
|
||||
differently:
|
||||
|
||||
**\_\_ /foo/d \_\_**
|
||||
|
||||
In sed /foo/d will delete all lines matching foo, in ed the commands are
|
||||
not repeated on each line so this command will search the next line
|
||||
matching foo and delete it. If you want to delete all lines matching
|
||||
foo, or do a subsitution on all lines matching foo you have to tell ed
|
||||
about it with the g (global) command:
|
||||
|
||||
echo $'1\n1\n3' > file
|
||||
|
||||
#replace all lines matching 1 by "replacement"
|
||||
ed -s file <<< $'g/1/s/1/replacement/\n,p'
|
||||
|
||||
#replace the first line matching 1 by "replacement"
|
||||
#(because it starts searching from the last line)
|
||||
ed -s file <<< $'s/1/replacement/\n,p'
|
||||
|
||||
**\_\_ an error stops the script \_\_**
|
||||
|
||||
You might think that it\'s not a problem and that the same thing happens
|
||||
with sed and you\'re right, with the exception that if ed does not find
|
||||
a pattern it\'s an error, while sed just continues with the next line.
|
||||
For instance, let\'s say that you want to change foo to bar on the first
|
||||
line of the file and add something after the next line, ed will stop if
|
||||
it cannot find foo on the first line, sed will continue.
|
||||
|
||||
#Gnu sed version
|
||||
sed -e '1s/foo/bar/' -e '$a\something' file
|
||||
|
||||
#First ed version, does nothing if foo is not found on the first line:
|
||||
ed -s file <<< $'H\n1s/foo/bar/\na\nsomething\n.\nw'
|
||||
|
||||
If you want the same behaviour you can use g/foo/ to trick ed. g/foo/
|
||||
will apply the command on all lines matching foo, thus the substitution
|
||||
will succeed and ed will not produce an error when foo is not found:
|
||||
|
||||
#Second version will add the line with "something" even if foo is not found
|
||||
ed -s file <<< $'H\n1g/foo/s/foo/bar/\na\nsomething\n.\nw'
|
||||
|
||||
In fact, even a substitution that fails after a g/ / command does not
|
||||
seem to cause an error, i.e. you can use a trick like g/./s/foo/bar/ to
|
||||
attempt the substitution on all non blank lines
|
||||
|
||||
### here documents
|
||||
|
||||
**\_\_ shell parameters are expanded \_\_**
|
||||
|
||||
If you don\'t quote the delimiter, \$ has a special meaning. This sounds
|
||||
obvious but it\'s easy to forget this fact when you use addresses like
|
||||
\$-1 or commands like \$a. Either quote the \$ or the delimiter:
|
||||
|
||||
#fails
|
||||
ed -s file << EOF
|
||||
$a
|
||||
last line
|
||||
.
|
||||
w
|
||||
EOF
|
||||
|
||||
#ok
|
||||
ed -s file << EOF
|
||||
\$a
|
||||
last line
|
||||
.
|
||||
w
|
||||
EOF
|
||||
|
||||
#ok again
|
||||
ed -s file << 'EOF'
|
||||
$a
|
||||
last line
|
||||
.
|
||||
w
|
||||
EOF
|
||||
|
||||
**\_\_ \".\" is not a command \_\_**
|
||||
|
||||
The . used to terminate the command \"a\" must be the only thing on the
|
||||
line. take care if you indent the commands:
|
||||
|
||||
#ed doesn't care about the spaces before the commands, but the . must be the only thing on the line:
|
||||
ed -s file << EOF
|
||||
a
|
||||
my content
|
||||
.
|
||||
w
|
||||
EOF
|
||||
|
||||
## Simulate other commands
|
||||
|
||||
Keep in mind that in all the examples below, the entire file will be
|
||||
read into memory.
|
||||
|
||||
### A simple grep
|
||||
|
||||
ed -s file <<< 'g/foo/p'
|
||||
|
||||
# equivalent
|
||||
ed -s file <<< 'g/foo/'
|
||||
|
||||
The name `grep` is derived from the notaion `g/RE/p` (global =\> regular
|
||||
expression =\> print). ref
|
||||
<http://www.catb.org/~esr/jargon/html/G/grep.html>
|
||||
|
||||
### wc -l
|
||||
|
||||
Since the default for the `ed` \"print line number\" command is the last
|
||||
line, a simple `=` (equal sign) will print this line number and thus the
|
||||
number of lines of the file:
|
||||
|
||||
ed -s file <<< '='
|
||||
|
||||
### cat
|
||||
|
||||
Yea, it\'s a joke\...
|
||||
|
||||
ed -s file <<< $',p'
|
||||
|
||||
\...but a similar thing to `cat` showing line-endings and escapes can be
|
||||
done with the `list` command (l):
|
||||
|
||||
ed -s file <<< $',l'
|
||||
|
||||
FIXME to be continued
|
||||
|
||||
## Links
|
||||
|
||||
Reference:
|
||||
|
||||
- [Gnu ed](http://www.gnu.org/software/ed/manual/ed_manual.html) - if
|
||||
we had to guess, you\'re probably using this one.
|
||||
- POSIX
|
||||
[ed](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38),
|
||||
[ex](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ex.html#tag_20_40),
|
||||
and
|
||||
[vi](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/vi.html#tag_20_152)
|
||||
- <http://sdf.lonestar.org/index.cgi?tutorials/ed> - ed cheatsheet on
|
||||
sdf.org
|
||||
|
||||
Misc info / tutorials:
|
||||
|
||||
- [How can I replace a string with another string in a variable, a
|
||||
stream, a file, or in all the files in a
|
||||
directory?](http://mywiki.wooledge.org/BashFAQ/021) - BashFAQ
|
||||
- <http://wolfram.schneider.org/bsd/7thEdManVol2/edtut/edtut.pdf> -
|
||||
Old but still relevant ed tutorial.
|
359
docs/howto/getopts_tutorial.md
Normal file
359
docs/howto/getopts_tutorial.md
Normal file
@ -0,0 +1,359 @@
|
||||
# Small getopts tutorial
|
||||
|
||||
![](keywords>bash shell scripting arguments positional parameters options getopt getopts)
|
||||
|
||||
## Description
|
||||
|
||||
**Note that** `getopts` is neither able to parse GNU-style long options
|
||||
(`--myoption`) nor XF86-style long options (`-myoption`). So, when you
|
||||
want to parse command line arguments in a professional ;-) way,
|
||||
`getopts` may or may not work for you. Unlike its older brother `getopt`
|
||||
(note the missing *s*!), it\'s a shell builtin command. The advantages
|
||||
are:
|
||||
|
||||
- No need to pass the positional parameters through to an external
|
||||
program.
|
||||
- Being a builtin, `getopts` can set shell variables to use for
|
||||
parsing (impossible for an *external* process!)
|
||||
- There\'s no need to argue with several `getopt` implementations
|
||||
which had buggy concepts in the past (whitespace, \...)
|
||||
- `getopts` is defined in POSIX(r).
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
Some other methods to parse positional parameters - using neither
|
||||
**getopt** nor **getopts** - are described in: [How to handle positional
|
||||
parameters](/scripting/posparams).
|
||||
|
||||
### Terminology
|
||||
|
||||
It\'s useful to know what we\'re talking about here, so let\'s see\...
|
||||
Consider the following command line:
|
||||
|
||||
mybackup -x -f /etc/mybackup.conf -r ./foo.txt ./bar.txt
|
||||
|
||||
These are all positional parameters, but they can be divided into
|
||||
several logical groups:
|
||||
|
||||
- `-x` is an **option** (aka **flag** or **switch**). It consists of a
|
||||
dash (`-`) followed by **one** character.
|
||||
- `-f` is also an option, but this option has an associated **option
|
||||
argument** (an argument to the option `-f`): `/etc/mybackup.conf`.
|
||||
The option argument is usually the argument following the option
|
||||
itself, but that isn\'t mandatory. Joining the option and option
|
||||
argument into a single argument `-f/etc/mybackup.conf` is valid.
|
||||
- `-r` depends on the configuration. In this example, `-r` doesn\'t
|
||||
take arguments so it\'s a standalone option like `-x`.
|
||||
- `./foo.txt` and `./bar.txt` are remaining arguments without any
|
||||
associated options. These are often used as **mass-arguments**. For
|
||||
example, the filenames specified for `cp(1)`, or arguments that
|
||||
don\'t need an option to be recognized because of the intended
|
||||
behavior of the program. POSIX(r) calls them **operands**.
|
||||
|
||||
To give you an idea about why `getopts` is useful, The above command
|
||||
line is equivalent to:
|
||||
|
||||
mybackup -xrf /etc/mybackup.conf ./foo.txt ./bar.txt
|
||||
|
||||
which is complex to parse without the help of `getopts`.
|
||||
|
||||
The option flags can be **upper- and lowercase** characters, or
|
||||
**digits**. It may recognize other characters, but that\'s not
|
||||
recommended (usability and maybe problems with special characters).
|
||||
|
||||
### How it works
|
||||
|
||||
In general you need to call `getopts` several times. Each time it will
|
||||
use the next positional parameter and a possible argument, if parsable,
|
||||
and provide it to you. `getopts` will not change the set of positional
|
||||
parameters. If you want to shift them, it must be done manually:
|
||||
|
||||
shift $((OPTIND-1))
|
||||
# now do something with $@
|
||||
|
||||
Since `getopts` sets an exit status of *FALSE* when there\'s nothing
|
||||
left to parse, it\'s easy to use in a while-loop:
|
||||
|
||||
while getopts ...; do
|
||||
...
|
||||
done
|
||||
|
||||
`getopts` will parse options and their possible arguments. It will stop
|
||||
parsing on the first non-option argument (a string that doesn\'t begin
|
||||
with a hyphen (`-`) that isn\'t an argument for any option in front of
|
||||
it). It will also stop parsing when it sees the `--` (double-hyphen),
|
||||
which means [end of options](/dict/terms/end_of_options).
|
||||
|
||||
### Used variables
|
||||
|
||||
variable description
|
||||
------------------------------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
[OPTIND](/syntax/shellvars#OPTIND) Holds the index to the next argument to be processed. This is how `getopts` \"remembers\" its own status between invocations. Also useful to shift the positional parameters after processing with `getopts`. `OPTIND` is initially set to 1, and **needs to be re-set to 1 if you want to parse anything again with getopts**
|
||||
[OPTARG](/syntax/shellvars#OPTARG) This variable is set to any argument for an option found by `getopts`. It also contains the option flag of an unknown option.
|
||||
[OPTERR](/syntax/shellvars#OPTERR) (Values 0 or 1) Indicates if Bash should display error messages generated by the `getopts` builtin. The value is initialized to **1** on every shell startup - so be sure to always set it to **0** if you don\'t want to see annoying messages! **`OPTERR` is not specified by POSIX for the `getopts` builtin utility \-\-- only for the C `getopt()` function in `unistd.h` (`opterr`).** `OPTERR` is bash-specific and not supported by shells such as ksh93, mksh, zsh, or dash.
|
||||
|
||||
`getopts` also uses these variables for error reporting (they\'re set to
|
||||
value-combinations which arent possible in normal operation).
|
||||
|
||||
### Specify what you want
|
||||
|
||||
The base-syntax for `getopts` is:
|
||||
|
||||
getopts OPTSTRING VARNAME [ARGS...]
|
||||
|
||||
where:
|
||||
|
||||
`OPTSTRING` tells `getopts` which options to expect and where to expect arguments (see below)
|
||||
------------- ------------------------------------------------------------------------------------
|
||||
`VARNAME` tells `getopts` which shell-variable to use for option reporting
|
||||
`ARGS` tells `getopts` to parse these optional words instead of the positional parameters
|
||||
|
||||
#### The option-string
|
||||
|
||||
The option-string tells `getopts` which options to expect and which of
|
||||
them must have an argument. The syntax is very simple \-\-- every option
|
||||
character is simply named as is, this example-string would tell
|
||||
`getopts` to look for `-f`, `-A` and `-x`:
|
||||
|
||||
getopts fAx VARNAME
|
||||
|
||||
When you want `getopts` to expect an argument for an option, just place
|
||||
a `:` (colon) after the proper option flag. If you want `-A` to expect
|
||||
an argument (i.e. to become `-A SOMETHING`) just do:
|
||||
|
||||
getopts fA:x VARNAME
|
||||
|
||||
If the **very first character** of the option-string is a `:` (colon),
|
||||
which would normally be nonsense because there\'s no option letter
|
||||
preceding it, `getopts` switches to \"**silent error reporting mode**\".
|
||||
In productive scripts, this is usually what you want because it allows
|
||||
you to handle errors yourself without being disturbed by annoying
|
||||
messages.
|
||||
|
||||
#### Custom arguments to parse
|
||||
|
||||
The `getopts` utility parses the [positional
|
||||
parameters](/scripting/posparams) of the current shell or function by
|
||||
default (which means it parses `"$@"`).
|
||||
|
||||
You can give your own set of arguments to the utility to parse. Whenever
|
||||
additional arguments are given after the `VARNAME` parameter, `getopts`
|
||||
doesn\'t try to parse the positional parameters, but these given words.
|
||||
|
||||
This way, you are able to parse any option set you like, here for
|
||||
example from an array:
|
||||
|
||||
while getopts :f:h opt "${MY_OWN_SET[@]}"; do
|
||||
...
|
||||
done
|
||||
|
||||
A call to `getopts` **without** these additional arguments is
|
||||
**equivalent** to explicitly calling it with `"$@"`:
|
||||
|
||||
getopts ... "$@"
|
||||
|
||||
### Error Reporting
|
||||
|
||||
Regarding error-reporting, there are two modes `getopts` can run in:
|
||||
|
||||
- verbose mode
|
||||
- silent mode
|
||||
|
||||
For productive scripts I recommend to use the silent mode, since
|
||||
everything looks more professional, when you don\'t see annoying
|
||||
standard messages. Also it\'s easier to handle, since the failure cases
|
||||
are indicated in an easier way.
|
||||
|
||||
#### Verbose Mode
|
||||
|
||||
invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is unset
|
||||
----------------------------- ----------------------------------------------------------------------------------------------
|
||||
required argument not found `VARNAME` is set to `?` (question-mark), `OPTARG` is unset and an *error message is printed*
|
||||
|
||||
#### Silent Mode
|
||||
|
||||
invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is set to the (invalid) option character
|
||||
----------------------------- -----------------------------------------------------------------------------------------------
|
||||
required argument not found `VARNAME` is set to `:` (colon) and `OPTARG` contains the option-character in question
|
||||
|
||||
## Using it
|
||||
|
||||
### A first example
|
||||
|
||||
Enough said - action!
|
||||
|
||||
Let\'s play with a very simple case: only one option (`-a`) expected,
|
||||
without any arguments. Also we disable the *verbose error handling* by
|
||||
preceding the whole option string with a colon (`:`):
|
||||
|
||||
``` bash
|
||||
#!/bin/bash
|
||||
|
||||
while getopts ":a" opt; do
|
||||
case $opt in
|
||||
a)
|
||||
echo "-a was triggered!" >&2
|
||||
;;
|
||||
\?)
|
||||
echo "Invalid option: -$OPTARG" >&2
|
||||
;;
|
||||
esac
|
||||
done
|
||||
```
|
||||
|
||||
I put that into a file named `go_test.sh`, which is the name you\'ll see
|
||||
below in the examples.
|
||||
|
||||
Let\'s do some tests:
|
||||
|
||||
#### Calling it without any arguments
|
||||
|
||||
$ ./go_test.sh
|
||||
$
|
||||
|
||||
Nothing happened? Right. `getopts` didn\'t see any valid or invalid
|
||||
options (letters preceded by a dash), so it wasn\'t triggered.
|
||||
|
||||
#### Calling it with non-option arguments
|
||||
|
||||
$ ./go_test.sh /etc/passwd
|
||||
$
|
||||
|
||||
Again \-\-- nothing happened. The **very same** case: `getopts` didn\'t
|
||||
see any valid or invalid options (letters preceded by a dash), so it
|
||||
wasn\'t triggered.
|
||||
|
||||
The arguments given to your script are of course accessible as `$1` -
|
||||
`${N}`.
|
||||
|
||||
#### Calling it with option-arguments
|
||||
|
||||
Now let\'s trigger `getopts`: Provide options.
|
||||
|
||||
First, an **invalid** one:
|
||||
|
||||
$ ./go_test.sh -b
|
||||
Invalid option: -b
|
||||
$
|
||||
|
||||
As expected, `getopts` didn\'t accept this option and acted like told
|
||||
above: It placed `?` into `$opt` and the invalid option character (`b`)
|
||||
into `$OPTARG`. With our `case` statement, we were able to detect this.
|
||||
|
||||
Now, a **valid** one (`-a`):
|
||||
|
||||
$ ./go_test.sh -a
|
||||
-a was triggered!
|
||||
$
|
||||
|
||||
You see, the detection works perfectly. The `a` was put into the
|
||||
variable `$opt` for our case statement.
|
||||
|
||||
Of course it\'s possible to **mix valid and invalid** options when
|
||||
calling:
|
||||
|
||||
$ ./go_test.sh -a -x -b -c
|
||||
-a was triggered!
|
||||
Invalid option: -x
|
||||
Invalid option: -b
|
||||
Invalid option: -c
|
||||
$
|
||||
|
||||
Finally, it\'s of course possible, to give our option **multiple
|
||||
times**:
|
||||
|
||||
$ ./go_test.sh -a -a -a -a
|
||||
-a was triggered!
|
||||
-a was triggered!
|
||||
-a was triggered!
|
||||
-a was triggered!
|
||||
$
|
||||
|
||||
The last examples lead us to some points you may consider:
|
||||
|
||||
- **invalid options don\'t stop the processing**: If you want to stop
|
||||
the script, you have to do it yourself (`exit` in the right place)
|
||||
- **multiple identical options are possible**: If you want to disallow
|
||||
these, you have to check manually (e.g. by setting a variable or so)
|
||||
|
||||
### An option with argument
|
||||
|
||||
Let\'s extend our example from above. Just a little bit:
|
||||
|
||||
- `-a` now takes an argument
|
||||
- on an error, the parsing exits with `exit 1`
|
||||
|
||||
``` bash
|
||||
#!/bin/bash
|
||||
|
||||
while getopts ":a:" opt; do
|
||||
case $opt in
|
||||
a)
|
||||
echo "-a was triggered, Parameter: $OPTARG" >&2
|
||||
;;
|
||||
\?)
|
||||
echo "Invalid option: -$OPTARG" >&2
|
||||
exit 1
|
||||
;;
|
||||
:)
|
||||
echo "Option -$OPTARG requires an argument." >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
```
|
||||
|
||||
Let\'s do the very same tests we did in the last example:
|
||||
|
||||
#### Calling it without any arguments
|
||||
|
||||
$ ./go_test.sh
|
||||
$
|
||||
|
||||
As above, nothing happened. It wasn\'t triggered.
|
||||
|
||||
#### Calling it with non-option arguments
|
||||
|
||||
$ ./go_test.sh /etc/passwd
|
||||
$
|
||||
|
||||
The **very same** case: It wasn\'t triggered.
|
||||
|
||||
#### Calling it with option-arguments
|
||||
|
||||
**Invalid** option:
|
||||
|
||||
$ ./go_test.sh -b
|
||||
Invalid option: -b
|
||||
$
|
||||
|
||||
As expected, as above, `getopts` didn\'t accept this option and acted
|
||||
like programmed.
|
||||
|
||||
**Valid** option, but without the mandatory **argument**:
|
||||
|
||||
$ ./go_test.sh -a
|
||||
Option -a requires an argument.
|
||||
$
|
||||
|
||||
The option was okay, but there is an argument missing.
|
||||
|
||||
Let\'s provide **the argument**:
|
||||
|
||||
$ ./go_test.sh -a /etc/passwd
|
||||
-a was triggered, Parameter: /etc/passwd
|
||||
$
|
||||
|
||||
## See also
|
||||
|
||||
- Internal: [posparams](/scripting/posparams)
|
||||
- Internal: [case](/syntax/ccmd/case)
|
||||
- Internal: [while_loop](/syntax/ccmd/while_loop)
|
||||
- POSIX
|
||||
[getopts(1)](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html#tag_20_54)
|
||||
and
|
||||
[getopt(3)](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html)
|
||||
- [parse CLI
|
||||
ARGV](https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash)
|
||||
- [handle command-line arguments (options) to a
|
||||
script](http://mywiki.wooledge.org/BashFAQ/035)
|
207
docs/howto/mutex.md
Normal file
207
docs/howto/mutex.md
Normal file
@ -0,0 +1,207 @@
|
||||
# Lock your script (against parallel execution)
|
||||
|
||||
![](keywords>bash shell scripting mutex locking run-control)
|
||||
|
||||
## Why lock?
|
||||
|
||||
Sometimes there\'s a need to ensure only one copy of a script runs, i.e
|
||||
prevent two or more copies running simultaneously. Imagine an important
|
||||
cronjob doing something very important, which will fail or corrupt data
|
||||
if two copies of the called program were to run at the same time. To
|
||||
prevent this, a form of `MUTEX` (**mutual exclusion**) lock is needed.
|
||||
|
||||
The basic procedure is simple: The script checks if a specific condition
|
||||
(locking) is present at startup, if yes, it\'s locked - the scipt
|
||||
doesn\'t start.
|
||||
|
||||
This article describes locking with common UNIX(r) tools. There are
|
||||
other special locking tools available, But they\'re not standardized, or
|
||||
worse yet, you can\'t be sure they\'re present when you want to run your
|
||||
scripts. **A tool designed for specifically for this purpose does the
|
||||
job much better than general purpose code.**
|
||||
|
||||
### Other, special locking tools
|
||||
|
||||
As told above, a special tool for locking is the preferred solution.
|
||||
Race conditions are avoided, as is the need to work around specific
|
||||
limits.
|
||||
|
||||
- `flock`: <http://www.kernel.org/pub/software/utils/script/flock/>
|
||||
- `solo`: <http://timkay.com/solo/>
|
||||
|
||||
## Choose the locking method
|
||||
|
||||
The best way to set a global lock condition is the UNIX(r) filesystem.
|
||||
Variables aren\'t enough, as each process has its own private variable
|
||||
space, but the filesystem is global to all processes (yes, I know about
|
||||
chroots, namespaces, \... special case). You can \"set\" several things
|
||||
in the filesystem that can be used as locking indicator:
|
||||
|
||||
- create files
|
||||
- update file timestamps
|
||||
- create directories
|
||||
|
||||
To create a file or set a file timestamp, usually the command touch is
|
||||
used. The following problem is implied: A locking mechanism checks for
|
||||
the existance of the lockfile, if no lockfile exists, it creates one and
|
||||
continues. Those are **two separate steps**! That means it\'s **not an
|
||||
atomic operation**. There\'s a small amount of time between checking and
|
||||
creating, where another instance of the same script could perform
|
||||
locking (because when it checked, the lockfile wasn\'t there)! In that
|
||||
case you would have 2 instances of the script running, both thinking
|
||||
they are succesfully locked, and can operate without colliding. Setting
|
||||
the timestamp is similar: One step to check the timespamp, a second step
|
||||
to set the timestamp.
|
||||
|
||||
\<WRAP center round tip 60%\> [**Conclusion:**]{.underline} We need an
|
||||
operation that does the check and the locking in one step. \</WRAP\>
|
||||
|
||||
A simple way to get that is to create a **lock directory** - with the
|
||||
mkdir command. It will:
|
||||
|
||||
* create a given directory only if it does not exist, and set a successful exit code
|
||||
* it will set an unsuccesful exit code if an error occours - for example, if the directory specified already exists
|
||||
|
||||
With mkdir it seems, we have our two steps in one simple operation. A
|
||||
(very!) simple locking code might look like this:
|
||||
|
||||
``` bash
|
||||
if mkdir /var/lock/mylock; then
|
||||
echo "Locking succeeded" >&2
|
||||
else
|
||||
echo "Lock failed - exit" >&2
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
In case `mkdir` reports an error, the script will exit at this point -
|
||||
**the MUTEX did its job!**
|
||||
|
||||
*If the directory is removed after setting a successful lock, while the
|
||||
script is still running, the lock is lost. Doing chmod -w for the parent
|
||||
directory containing the lock directory can be done, but it is not
|
||||
atomic. Maybe a while loop checking continously for the existence of the
|
||||
lock in the background and sending a signal such as USR1, if the
|
||||
directory is not found, can be done. The signal would need to be
|
||||
trapped. I am sure there there is a better solution than this
|
||||
suggestion* \-\-- *[sn18](sunny_delhi18@yahoo.com) 2009/12/19 08:24*
|
||||
|
||||
**Note:** While perusing the Internet, I found some people asking if the
|
||||
`mkdir` method works \"on all filesystems\". Well, let\'s say it should.
|
||||
The syscall under `mkdir` is guarenteed to work atomicly in all cases,
|
||||
at least on Unices. Two examples of problems are NFS filesystems and
|
||||
filesystems on cluster servers. With those two scenarios, dependencies
|
||||
exist related to the mount options and implementation. However, I
|
||||
successfully use this simple method on an Oracle OCFS2 filesystem in a
|
||||
4-node cluster environment. So let\'s just say \"it should work under
|
||||
normal conditions\".
|
||||
|
||||
Another atomic method is setting the `noclobber` shell option
|
||||
(`set -C`). That will cause redirection to fail, if the file the
|
||||
redirection points to already exists (using diverse `open()` methods).
|
||||
Need to write a code example here.
|
||||
|
||||
``` bash
|
||||
|
||||
if ( set -o noclobber; echo "locked" > "$lockfile") 2> /dev/null; then
|
||||
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
|
||||
echo "Locking succeeded" >&2
|
||||
rm -f "$lockfile"
|
||||
else
|
||||
echo "Lock failed - exit" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
```
|
||||
|
||||
Another explanation of this basic pattern using `set -C` can be found
|
||||
[here](http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_07).
|
||||
|
||||
## An example
|
||||
|
||||
This code was taken from a production grade script that controls PISG to
|
||||
create statistical pages from my IRC logfiles. There are some
|
||||
differences compared to the very simple example above:
|
||||
|
||||
- the locking stores the process ID of the locked instance
|
||||
- if a lock fails, the script tries to find out if the locked instance
|
||||
still is active (unreliable!)
|
||||
- traps are created to automatically remove the lock when the script
|
||||
terminates, or is killed
|
||||
|
||||
Details on how the script is killed aren\'t given, only code relevant to
|
||||
the locking process is shown:
|
||||
|
||||
``` bash
|
||||
#!/bin/bash
|
||||
|
||||
# lock dirs/files
|
||||
LOCKDIR="/tmp/statsgen-lock"
|
||||
PIDFILE="${LOCKDIR}/PID"
|
||||
|
||||
# exit codes and text
|
||||
ENO_SUCCESS=0; ETXT[0]="ENO_SUCCESS"
|
||||
ENO_GENERAL=1; ETXT[1]="ENO_GENERAL"
|
||||
ENO_LOCKFAIL=2; ETXT[2]="ENO_LOCKFAIL"
|
||||
ENO_RECVSIG=3; ETXT[3]="ENO_RECVSIG"
|
||||
|
||||
###
|
||||
### start locking attempt
|
||||
###
|
||||
|
||||
trap 'ECODE=$?; echo "[statsgen] Exit: ${ETXT[ECODE]}($ECODE)" >&2' 0
|
||||
echo -n "[statsgen] Locking: " >&2
|
||||
|
||||
if mkdir "${LOCKDIR}" &>/dev/null; then
|
||||
|
||||
# lock succeeded, install signal handlers before storing the PID just in case
|
||||
# storing the PID fails
|
||||
trap 'ECODE=$?;
|
||||
echo "[statsgen] Removing lock. Exit: ${ETXT[ECODE]}($ECODE)" >&2
|
||||
rm -rf "${LOCKDIR}"' 0
|
||||
echo "$$" >"${PIDFILE}"
|
||||
# the following handler will exit the script upon receiving these signals
|
||||
# the trap on "0" (EXIT) from above will be triggered by this trap's "exit" command!
|
||||
trap 'echo "[statsgen] Killed by a signal." >&2
|
||||
exit ${ENO_RECVSIG}' 1 2 3 15
|
||||
echo "success, installed signal handlers"
|
||||
|
||||
else
|
||||
|
||||
# lock failed, check if the other PID is alive
|
||||
OTHERPID="$(cat "${PIDFILE}")"
|
||||
|
||||
# if cat isn't able to read the file, another instance is probably
|
||||
# about to remove the lock -- exit, we're *still* locked
|
||||
# Thanks to Grzegorz Wierzowiecki for pointing out this race condition on
|
||||
# http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash
|
||||
if [ $? != 0 ]; then
|
||||
echo "lock failed, PID ${OTHERPID} is active" >&2
|
||||
exit ${ENO_LOCKFAIL}
|
||||
fi
|
||||
|
||||
if ! kill -0 $OTHERPID &>/dev/null; then
|
||||
# lock is stale, remove it and restart
|
||||
echo "removing stale lock of nonexistant PID ${OTHERPID}" >&2
|
||||
rm -rf "${LOCKDIR}"
|
||||
echo "[statsgen] restarting myself" >&2
|
||||
exec "$0" "$@"
|
||||
else
|
||||
# lock is valid and OTHERPID is active - exit, we're locked!
|
||||
echo "lock failed, PID ${OTHERPID} is active" >&2
|
||||
exit ${ENO_LOCKFAIL}
|
||||
fi
|
||||
|
||||
fi
|
||||
```
|
||||
|
||||
## Related links
|
||||
|
||||
- [BashFAQ/045](http://mywiki.wooledge.org/BashFAQ/045)
|
||||
- [Implementation of a shell locking
|
||||
utility](http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash)
|
||||
- [Wikipedia article on File
|
||||
Locking](http://en.wikipedia.org/wiki/File_locking), including a
|
||||
discussion of potential
|
||||
[problems](http://en.wikipedia.org/wiki/File_locking#Problems) with
|
||||
flock and certain versions of NFS.
|
353
docs/howto/pax.md
Normal file
353
docs/howto/pax.md
Normal file
@ -0,0 +1,353 @@
|
||||
# pax - the POSIX archiver
|
||||
|
||||
![](keywords>bash shell scripting POSIX archive tar packing zip)
|
||||
|
||||
pax can do a lot of fancy stuff, feel free to contribute more awesome
|
||||
pax tricks!
|
||||
|
||||
## Introduction
|
||||
|
||||
The POSIX archiver, `pax`, is an attempt at a standardized archiver with
|
||||
the best features of `tar` and `cpio`, able to handle all common archive
|
||||
types.
|
||||
|
||||
However, this is **not a manpage**, it will **not** list all possible
|
||||
options, it will **not** you detailed information about `pax`. It\'s
|
||||
only an introduction.
|
||||
|
||||
This article is based on the debianized Berkeley implementation of
|
||||
`pax`, but implementation-specific things should be tagged as such.
|
||||
Unfortunately, the Debian package doesn\'t seem to be maintained
|
||||
anymore.
|
||||
|
||||
## Overview
|
||||
|
||||
### Operation modes
|
||||
|
||||
There are four basic operation modes to *list*, *read*, *write* and
|
||||
*copy* archives. They\'re switched with combinations of `-r` and `-w`
|
||||
command line options:
|
||||
|
||||
Mode RW-Options
|
||||
------- -----------------
|
||||
List *no RW-options*
|
||||
Read `-r`
|
||||
Write `-w`
|
||||
Copy `-r -w`
|
||||
|
||||
#### List
|
||||
|
||||
In *list mode*, `pax` writes the list of archive members to standard
|
||||
output (a table of contents). If a pattern match is specified on the
|
||||
command line, only matching filenames are printed.
|
||||
|
||||
#### Read
|
||||
|
||||
*Read* an archive. `pax` will read archive data and extract the members
|
||||
to the current directory. If a pattern match is specified on the command
|
||||
line, only matching filenames are extracted.
|
||||
|
||||
When reading an archive, the archive type is determined from the archive
|
||||
data.
|
||||
|
||||
#### Write
|
||||
|
||||
*Write* an archive, which means create a new one or append to an
|
||||
existing one. All files and directories specified on the command line
|
||||
are inserted into the archive. The archive is written to standard output
|
||||
by default.
|
||||
|
||||
If no files are specified on the command line, filenames are read from
|
||||
`STDIN`.
|
||||
|
||||
The write mode is the only mode where you need to specify the archive
|
||||
type with `-x <TYPE>`, e.g. `-x ustar`.
|
||||
|
||||
#### Copy
|
||||
|
||||
*Copy* mode is similar to `cpio` passthrough mode. It provides a way to
|
||||
replicate a complete or partial file hierarchy (with all the `pax`
|
||||
options, e.g. rewriting groups) to another location.
|
||||
|
||||
### Archive data
|
||||
|
||||
When you don\'t specify anything special, `pax` will attempt to read
|
||||
archive data from standard input (read/list modes) and write archive
|
||||
data to standard output (write mode). This ensures `pax` can be easily
|
||||
used as part of a shell pipe construct, e.g. to read a compressed
|
||||
archive that\'s decompressed in the pipe.
|
||||
|
||||
The option to specify the pathname of a file to be archived is `-f` This
|
||||
file will be used as input or output, depending on the operation
|
||||
(read/write/list).
|
||||
|
||||
When pax reads an archive, it tries to guess the archive type. However,
|
||||
in *write* mode, you must specify which type of archive to append using
|
||||
the `-x <TYPE>` switch. If you omit this switch, a default archive will
|
||||
be created (POSIX says it\'s implementation defined, Berkeley `pax`
|
||||
creates `ustar` if no options are specified).
|
||||
|
||||
The following archive formats are supported (Berkeley implementation):
|
||||
|
||||
--------- ----------------------------
|
||||
ustar POSIX TAR format (default)
|
||||
cpio POSIX CPIO format
|
||||
tar classic BSD TAR format
|
||||
bcpio old binary CPIO format
|
||||
sv4cpio SVR4 CPIO format
|
||||
sv4crc SVR4 CPIO format with CRC
|
||||
--------- ----------------------------
|
||||
|
||||
Berkeley `pax` supports options `-z` and `-j`, similar to GNU `tar`, to
|
||||
filter archive files through GZIP/BZIP2.
|
||||
|
||||
### Matching archive members
|
||||
|
||||
In *read* and *list* modes, you can specify patterns to determine which
|
||||
files to list or extract.
|
||||
|
||||
- the pattern notation is the one known by a POSIX-shell, i.e. the one
|
||||
known by Bash without `extglob`
|
||||
- if the specified pattern matches a complete directory, it affects
|
||||
all files and subdirectories of the specified directory
|
||||
- if you specify the `-c` option, `pax` will invert the matches, i.e.
|
||||
it matches all filenames **except** those matching the specified
|
||||
patterns
|
||||
- if no patterns are given, `pax` will \"match\" (list or extract) all
|
||||
files from the archive
|
||||
- **To avoid conflicts with shell pathname expansion, it\'s wise to
|
||||
quote patterns!**
|
||||
|
||||
#### Some assorted examples of patterns
|
||||
|
||||
pax -r <myarchive.tar 'data/sales/*.txt' 'data/products/*.png'
|
||||
|
||||
pax -r <myarchive.tar 'data/sales/year_200[135].txt'
|
||||
# should be equivalent to
|
||||
pax -r <myarchive.tar 'data/sales/year_2001.txt' 'data/sales/year_2003.txt' 'data/sales/year_2005.txt'
|
||||
|
||||
## Using pax
|
||||
|
||||
This is a brief description of using `pax` as a normal archiver system,
|
||||
like you would use `tar`.
|
||||
|
||||
### Creating an archive
|
||||
|
||||
This task is done with basic syntax
|
||||
|
||||
# archive contents to stdout
|
||||
pax -w >archive.tar README.txt *.png data/
|
||||
|
||||
# equivalent, extract archive contents directly to a file
|
||||
pax -w -x ustar -f archive.tar README.txt *.png data/
|
||||
|
||||
`pax` is in *write* mode, the given filenames are packed into an
|
||||
archive:
|
||||
|
||||
- `README.txt` is a normal file, it will be packed
|
||||
- `*.png` is a pathname glob **for your shell**, the shell will
|
||||
substitute all matching filenames **before** `pax` is executed. The
|
||||
result is a list of filenames that will be packed like the
|
||||
`README.txt` example above
|
||||
- `data/` is a directory. **Everything** in this directory will be
|
||||
packed into the archive, i.e. not just an empty directory
|
||||
|
||||
When you specify the `-v` option, `pax` will write the pathnames of the
|
||||
files inserted into the archive to `STDERR`.
|
||||
|
||||
When, and only when, no filename arguments are specified, `pax` attempts
|
||||
to read filenames from `STDIN`, separated by newlines. This way you can
|
||||
easily combine `find` with `pax`:
|
||||
|
||||
find . -name '*.txt' | pax -wf textfiles.tar -x ustar
|
||||
|
||||
### Listing archive contents
|
||||
|
||||
The standard output format to list archive members simply is to print
|
||||
each filename to a separate line. But the output format can be
|
||||
customized to include permissions, timestamps, etc. with the
|
||||
`-o listopt=<FORMAT>` specification. The syntax of the format
|
||||
specification is strongly derived from the `printf(3)` format
|
||||
specification.
|
||||
|
||||
**Unfortunately** the `pax` utility delivered with Debian doesn\'t seem
|
||||
to support these extended listing formats.
|
||||
|
||||
However, `pax` lists archive members in a `ls -l`-like format, when you
|
||||
give the `-v` option:
|
||||
|
||||
pax -v <myarchive.tar
|
||||
# or, of course
|
||||
pax -vf myarchive.tar
|
||||
|
||||
### Extracting from an archive
|
||||
|
||||
You can extract all files, or files (not) matching specific patterns
|
||||
from an archive using constructs like:
|
||||
|
||||
# "normal" extraction
|
||||
pax -rf myarchive.tar '*.txt'
|
||||
|
||||
# with inverted pattern
|
||||
pax -rf myarchive.tar -c '*.txt'
|
||||
|
||||
### Copying files
|
||||
|
||||
To copy directory contents to another directory, similar to a `cp -a`
|
||||
command, use:
|
||||
|
||||
mkdir destdir
|
||||
pax -rw dir destdir #creates a copy of dir in destdir/, i.e. destdir/dir
|
||||
|
||||
### Copying files via ssh
|
||||
|
||||
To copy directory contents to another directory on a remote system, use:
|
||||
|
||||
pax -w localdir | ssh user@host "cd distantdest && pax -r -v"
|
||||
pax -w localdir | gzip | ssh user@host "cd distantdir && gunzip | pax -r -v" #compress the sent data
|
||||
|
||||
These commands create a copy of localdir in distandir (distantdir/dir)
|
||||
on the remote machine.
|
||||
|
||||
## Advanced usage
|
||||
|
||||
### Backup your daily work
|
||||
|
||||
[**Note:**]{.underline} `-T` is an extension and is not defined by
|
||||
POSIX.
|
||||
|
||||
Say you have write-access to a fileserver mounted on your filesystem
|
||||
tree. In *copy* mode, you can tell `pax` to copy only files that were
|
||||
modified today:
|
||||
|
||||
mkdir /n/mybackups/$(date +%A)/
|
||||
pax -rw -T 0000 data/ /n/mybackups/$(date +%A)/
|
||||
|
||||
This is done using the `-T` switch, which normally allows you to specify
|
||||
a time window, but in this case, only the start time which means \"today
|
||||
at midnight\".
|
||||
|
||||
When you execute this \"very simple backup\" after your daily work, you
|
||||
will have a copy of the modified files.
|
||||
|
||||
[**Note:**]{.underline} The `%A` format from `date` expands to the name
|
||||
of the current day, localized, e.g. \"Friday\" (en) or \"Mittwoch\"
|
||||
(de).
|
||||
|
||||
The same, but with an archive, can be accomplished by:
|
||||
|
||||
pax -w -T 0000 -f /n/mybackups/$(date +%A)
|
||||
|
||||
In this case, the day-name is an archive-file (you don\'t need a
|
||||
filename extension like `.tar` but you can add one, if desired).
|
||||
|
||||
### Changing filenames while archiving
|
||||
|
||||
`pax` is able to rewrite filenames while archiving or while extracting
|
||||
from an archive. This example creates a tar archive containing the
|
||||
`holiday_2007/` directory, but the directory name inside the archive
|
||||
will be `holiday_pics/`:
|
||||
|
||||
pax -x ustar -w -f holiday_pictures.tar -s '/^holiday_2007/holiday_pics/' holiday_2007/
|
||||
|
||||
The option responsible for the string manipulation is the
|
||||
`-s <REWRITE-SPECIFICATION>`. It takes the string rewrite specification
|
||||
as an argument, in the form `/OLD/NEW/[gp]`, which is an `ed(1)`-like
|
||||
regular expression (BRE) for `old` and generally can be used like the
|
||||
popular sed construct `s/from/to/`. Any non-null character can be used
|
||||
as a delimiter, so to mangle pathnames (containing slashes), you could
|
||||
use `#/old/path#/new/path#`.
|
||||
|
||||
The optional `g` and `p` flags are used to apply substitution
|
||||
**(g)**lobally to the line or to **(p)**rint the original and rewritten
|
||||
strings to `STDERR`.
|
||||
|
||||
Multiple `-s` options can be specified on the command line. They are
|
||||
applied to the pathname strings of the files or archive members. This
|
||||
happens in the order they are specified.
|
||||
|
||||
### Excluding files from an archive
|
||||
|
||||
The -s command seen above can be used to exclude a file. The
|
||||
substitution must result in a null string: For example, let\'s say that
|
||||
you want to exclude all the CVS directories to create a source code
|
||||
archive. We are going to replace the names containing /CVS/ with
|
||||
nothing, note the .\* they are needed because we need to match the
|
||||
entire pathname.
|
||||
|
||||
pax -w -x ustar -f release.tar -s',.*/CVS/.*,,' myapplication
|
||||
|
||||
You can use several -s options, for instance, let\'s say you also want
|
||||
to remove files ending in \~:
|
||||
|
||||
pax -w -x ustar -f release.tar -'s,.*/CVS/.*,,' -'s/.*~//' myapplication
|
||||
|
||||
This can also be done while reading an archive, for instance, suppose
|
||||
you have an archive containing a \"usr\" and a \"etc\" directory but
|
||||
that you want to extract only the \"usr\" directory:
|
||||
|
||||
pax -r -f archive.tar -s',^etc/.*,,' #the etc/ dir is not extracted
|
||||
|
||||
### Getting archive filenames from STDIN
|
||||
|
||||
Like `cpio`, pax can read filenames from standard input (`stdin`). This
|
||||
provides great flexibility - for example, a `find(1)` command may select
|
||||
files/directories in ways pax can\'t do itself. In **write** mode
|
||||
(creating an archive) or **copy** mode, when no filenames are given, pax
|
||||
expects to read filenames from standard input. For example:
|
||||
|
||||
# Back up config files changed less than 3 days ago
|
||||
find /etc -type f -mtime -3 | pax -x ustar -w -f /backups/etc.tar
|
||||
|
||||
# Copy only the directories, not the files
|
||||
mkdir /target
|
||||
find . -type d -print | pax -r -w -d /target
|
||||
|
||||
# Back up anything that changed since the last backup
|
||||
find . -newer /var/run/mylastbackup -print0 |
|
||||
pax -0 -x ustar -w -d -f /backups/mybackup.tar
|
||||
touch /var/run/mylastbackup
|
||||
|
||||
The `-d` option tells pax `not` to recurse into directories it reads
|
||||
(`cpio`-style). Without `-d`, pax recurses into all directories
|
||||
(`tar`-style).
|
||||
|
||||
**Note**: the `-0` option is not standard, but is present in some
|
||||
implementations.
|
||||
|
||||
## From tar to pax
|
||||
|
||||
`pax` can handle the `tar` archive format, if you want to switch to the
|
||||
standard tool an alias like:
|
||||
|
||||
alias tar='echo USE PAX, idiot. pax is the standard archiver!; # '
|
||||
|
||||
in your `~/.bashrc` can be useful :-D.
|
||||
|
||||
Here is a quick table comparing (GNU) `tar` and `pax` to help you to
|
||||
make the switch:
|
||||
|
||||
TAR PAX Notes
|
||||
------------------------------------- ------------------------------------------ -----------------------------------------------------------------------
|
||||
`tar xzvf file.tar.gz` `pax -rvz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip <file.tar.gz | pax -rv`
|
||||
`tar czvf archive.tar.gz path ...` `pax -wvz -f archive.tar.gz path ...` `-z` is an extension, POSIXly: `pax -wv path | gzip > archive.tar.gz`
|
||||
`tar xjvf file.tar.bz2` `bunzip2 <file.tar.bz2 | pax -rv`
|
||||
`tar cjvf archive.tar.bz2 path ...` `pax -wv path | bzip2 > archive.tar.bz2`
|
||||
`tar tzvf file.tar.gz` `pax -vz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip <file.tar.gz | pax -v`
|
||||
|
||||
`pax` might not create ustar (`tar`) archives by default but its own pax
|
||||
format, add `-x ustar` if you want to ensure pax creates tar archives!
|
||||
|
||||
## Implementations
|
||||
|
||||
- [AT&T AST toolkit](http://www2.research.att.com/sw/download/) \|
|
||||
[manpage](http://www2.research.att.com/~gsf/man/man1/pax.html)
|
||||
- [Heirloom toolchest](http://heirloom.sourceforge.net/index.html) \|
|
||||
[manpage](http://heirloom.sourceforge.net/man/pax.1.html)
|
||||
- [OpenBSD pax](http://www.openbsd.org/cgi-bin/cvsweb/src/bin/pax/) \|
|
||||
[manpage](http://www.openbsd.org/cgi-bin/man.cgi?query=pax&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html)
|
||||
- [MirBSD pax](https://launchpad.net/paxmirabilis) \|
|
||||
[manpage](https://www.mirbsd.org/htman/i386/man1/pax.htm) - Debian
|
||||
bases their package upon this.
|
||||
- [SUS pax
|
||||
specification](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html)
|
697
docs/howto/redirection_tutorial.md
Normal file
697
docs/howto/redirection_tutorial.md
Normal file
@ -0,0 +1,697 @@
|
||||
# Illustrated Redirection Tutorial
|
||||
|
||||
![](keywords>bash shell scripting tutorial redirection redirect file descriptor)
|
||||
|
||||
This tutorial is not a complete guide to redirection, it will not cover
|
||||
here docs, here strings, name pipes etc\... I just hope it\'ll help you
|
||||
to understand what things like `3>&2`, `2>&1` or `1>&3-` do.
|
||||
|
||||
# stdin, stdout, stderr
|
||||
|
||||
When Bash starts, normally, 3 file descriptors are opened, `0`, `1` and
|
||||
`2` also known as standard input (`stdin`), standard output (`stdout`)
|
||||
and standard error (`stderr`).
|
||||
|
||||
For example, with Bash running in a Linux terminal emulator, you\'ll
|
||||
see:
|
||||
|
||||
# lsof +f g -ap $BASHPID -d 0,1,2
|
||||
COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME
|
||||
bash 12135 root 0u CHR RW,LG 136,13 0t0 16 /dev/pts/5
|
||||
bash 12135 root 1u CHR RW,LG 136,13 0t0 16 /dev/pts/5
|
||||
bash 12135 root 2u CHR RW,LG 136,13 0t0 16 /dev/pts/5
|
||||
|
||||
This `/dev/pts/5` is a pseudo terminal used to emulate a real terminal.
|
||||
Bash reads (`stdin`) from this terminal and prints via `stdout` and
|
||||
`stderr` to this terminal.
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
When a command, a compound command, a subshell etc. is executed, it
|
||||
inherits these file descriptors. For instance `echo foo` will send the
|
||||
text `foo` to the file descriptor `1` inherited from the shell, which is
|
||||
connected to `/dev/pts/5`.
|
||||
|
||||
# Simple Redirections
|
||||
|
||||
## Output Redirection \"n\> file\"
|
||||
|
||||
`>` is probably the simplest redirection.
|
||||
|
||||
`echo foo > file`
|
||||
|
||||
the `> file` after the command alters the file descriptors belonging to
|
||||
the command `echo`. It changes the file descriptor `1` (`> file` is the
|
||||
same as `1>file`) so that it points to the file `file`. They will look
|
||||
like:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
Now characters written by our command, `echo`, that are sent to the
|
||||
standard output, i.e., the file descriptor `1`, end up in the file named
|
||||
`file`.
|
||||
|
||||
In the same way, command `2> file` will change the standard error and
|
||||
will make it point to `file`. Standard error is used by applications to
|
||||
print errors.
|
||||
|
||||
What will `command 3> file` do? It will open a new file descriptor
|
||||
pointing to `file`. The command will then start with:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
new descriptor ( 3 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
What will the command do with this descriptor? It depends. Often
|
||||
nothing. We will see later why we might want other file descriptors.
|
||||
|
||||
## Input Redirection \"n\< file\"
|
||||
|
||||
When you run a commandusing `command < file`, it changes the file
|
||||
descriptor `0` so that it looks like:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) <----| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
If the command reads from `stdin`, it now will read from `file` and not
|
||||
from the console.
|
||||
|
||||
As with `>`, `<` can be used to open a new file descriptor for reading,
|
||||
`command 3<file`. Later we will see how this can be useful.
|
||||
|
||||
## Pipes \|
|
||||
|
||||
What does this `|` do? Among other things, it connects the standard
|
||||
output of the command on the left to the standard input of the command
|
||||
on the right. That is, it creates a special file, a pipe, which is
|
||||
opened as a write destinaton for the left command, and as a read source
|
||||
for the right command.
|
||||
|
||||
echo foo | cat
|
||||
|
||||
--- +--------------+ --- +--------------+
|
||||
( 0 ) ---->| /dev/pts/5 | ------> ( 0 ) ---->|pipe (read) |
|
||||
--- +--------------+ / --- +--------------+
|
||||
/
|
||||
--- +--------------+ / --- +--------------+
|
||||
( 1 ) ---->| pipe (write) | / ( 1 ) ---->| /dev/pts |
|
||||
--- +--------------+ --- +--------------+
|
||||
|
||||
--- +--------------+ --- +--------------+
|
||||
( 2 ) ---->| /dev/pts/5 | ( 2 ) ---->| /dev/pts/ |
|
||||
--- +--------------+ --- +--------------+
|
||||
|
||||
This is possible because the redirections are set up by the shell
|
||||
**before** the commands are executed, and the commands inherit the file
|
||||
descriptors.
|
||||
|
||||
# More On File Descriptors
|
||||
|
||||
## Duplicating File Descriptor 2\>&1
|
||||
|
||||
We have seen how to open (or redirect) file descriptors. Let us see how
|
||||
to duplicate them, starting with the classic `2>&1`. What does this
|
||||
mean? That something written on the file descriptor `2` will go where
|
||||
file descriptor `1` goes. In a shell `command 2>&1` is not a very
|
||||
interesting example so we will use `ls /tmp/ doesnotexist 2>&1 | less`
|
||||
|
||||
ls /tmp/ doesnotexist 2>&1 | less
|
||||
|
||||
--- +--------------+ --- +--------------+
|
||||
( 0 ) ---->| /dev/pts/5 | ------> ( 0 ) ---->|from the pipe |
|
||||
--- +--------------+ / ---> --- +--------------+
|
||||
/ /
|
||||
--- +--------------+ / / --- +--------------+
|
||||
( 1 ) ---->| to the pipe | / / ( 1 ) ---->| /dev/pts |
|
||||
--- +--------------+ / --- +--------------+
|
||||
/
|
||||
--- +--------------+ / --- +--------------+
|
||||
( 2 ) ---->| to the pipe | / ( 2 ) ---->| /dev/pts/ |
|
||||
--- +--------------+ --- +--------------+
|
||||
|
||||
Why is it called *duplicating*? Because after `2>&1`, we have 2 file
|
||||
descriptors pointing to the same file. Take care not to call this \"File
|
||||
Descriptor Aliasing\"; if we redirect `stdout` after `2>&1` to a file
|
||||
`B`, file descriptor `2` will still be opened on the file `A` where it
|
||||
was. This is often misunderstood by people wanting to redirect both
|
||||
standard input and standard output to the file. Continue reading for
|
||||
more on this.
|
||||
|
||||
So if you have two file descriptors `s` and `t` like:
|
||||
|
||||
--- +-----------------------+
|
||||
a descriptor ( s ) ---->| /some/file |
|
||||
--- +-----------------------+
|
||||
--- +-----------------------+
|
||||
a descriptor ( t ) ---->| /another/file |
|
||||
--- +-----------------------+
|
||||
|
||||
Using a `t>&s` (where `t` and `s` are numbers) it means:
|
||||
|
||||
> Copy whatever file descriptor `s` contains into file descriptor `t`
|
||||
|
||||
So you got a copy of this descriptor:
|
||||
|
||||
--- +-----------------------+
|
||||
a descriptor ( s ) ---->| /some/file |
|
||||
--- +-----------------------+
|
||||
--- +-----------------------+
|
||||
a descriptor ( t ) ---->| /some/file |
|
||||
--- +-----------------------+
|
||||
|
||||
Internally each of these is represented by a file descriptor opened by
|
||||
the operating system\'s `fopen` calls, and is likely just a pointer to
|
||||
the file which has been opened for reading (`stdin` or file descriptor
|
||||
`0`) or writing (`stdout` /`stderr`).
|
||||
|
||||
Note that the file reading or writing positions are also duplicated. If
|
||||
you have already read a line of `s`, then after `t>&s` if you read a
|
||||
line from `t`, you will get the second line of the file.
|
||||
|
||||
Similarly for output file descriptors, writing a line to file descriptor
|
||||
`s` will append a line to a file as will writing a line to file
|
||||
descriptor `t`.
|
||||
|
||||
\<note tip\>The syntax is somewhat confusing in that you would think
|
||||
that the arrow would point in the direction of the copy, but it\'s
|
||||
reversed. So it\'s `target>&source` effectively.\</note\>
|
||||
|
||||
So, as a simple example (albeit slightly contrived), is the following:
|
||||
|
||||
exec 3>&1 # Copy 1 into 3
|
||||
exec 1> logfile # Make 1 opened to write to logfile
|
||||
lotsa_stdout # Outputs to fd 1, which writes to logfile
|
||||
exec 1>&3 # Copy 3 back into 1
|
||||
echo Done # Output to original stdout
|
||||
|
||||
## Order Of Redirection, i.e., \"\> file 2\>&1\" vs. \"2\>&1 \>file\"
|
||||
|
||||
While it doesn\'t matter where the redirections appears on the command
|
||||
line, their order does matter. They are set up from left to right.
|
||||
|
||||
- `2>&1 >file`
|
||||
|
||||
A common error, is to do `command 2>&1 > file` to redirect both `stderr`
|
||||
and `stdout` to `file`. Let\'s see what\'s going on. First we type the
|
||||
command in our terminal, the descriptors look like this:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
Then our shell, Bash sees `2>&1` so it duplicates 1, and the file
|
||||
descriptor look like this:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
That\'s right, nothing has changed, 2 was already pointing to the same
|
||||
place as 1. Now Bash sees `> file` and thus changes `stdout`:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
And that\'s not what we want.
|
||||
|
||||
- `>file 2>&1`
|
||||
|
||||
Now let\'s look at the correct `command >file 2>&1`. We start as in the
|
||||
previous example, and Bash sees `> file`:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
Then it sees our duplication `2>&1`:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
And voila, both `1` and `2` are redirected to file.
|
||||
|
||||
## Why sed \'s/foo/bar/\' file \>file Doesn\'t Work
|
||||
|
||||
This is a common error, we want to modify a file using something that
|
||||
reads from a file and writes the result to `stdout`. To do this, we
|
||||
redirect stdout to the file we want to modify. The problem here is that,
|
||||
as we have seen, the redirections are setup before the command is
|
||||
actually executed.
|
||||
|
||||
So **BEFORE** sed starts, standard output has already been redirected,
|
||||
with the additional side effect that, because we used \>, \"file\" gets
|
||||
truncated. When `sed` starts to read the file, it contains nothing.
|
||||
|
||||
## exec
|
||||
|
||||
In Bash the `exec` built-in replaces the shell with the specified
|
||||
program. So what does this have to do with redirection? `exec` also
|
||||
allow us to manipulate the file descriptors. If you don\'t specify a
|
||||
program, the redirection after `exec` modifies the file descriptors of
|
||||
the current shell.
|
||||
|
||||
For example, all the commands after `exec 2>file` will have file
|
||||
descriptors like:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
All the the errors sent to `stderr` by the commands after the
|
||||
`exec 2>file` will go to the file, just as if you had the command in a
|
||||
script and ran `myscript 2>file`.
|
||||
|
||||
`exec` can be used, if, for instance, you want to log the errors the
|
||||
commands in your script produce, just add `exec 2>myscript.errors` at
|
||||
the beginning of your script.
|
||||
|
||||
Let\'s see another use case. We want to read a file line by line, this
|
||||
is easy, we just do:
|
||||
|
||||
while read -r line;do echo "$line";done < file
|
||||
|
||||
Now, we want, after printing each line, to do a pause, waiting for the
|
||||
user to press a key:
|
||||
|
||||
while read -r line;do echo "$line"; read -p "Press any key" -n 1;done < file
|
||||
|
||||
And, surprise this doesn\'t work. Why? because the shell descriptor of
|
||||
the while loop looks like:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
and our read inherits these descriptors, and our command
|
||||
(`read -p "Press any key" -n 1`) inherits them, and thus reads from file
|
||||
and not from our terminal.
|
||||
|
||||
A quick look at `help read` tells us that we can specify a file
|
||||
descriptor from which `read` should read. Cool. Now let\'s use `exec` to
|
||||
get another descriptor:
|
||||
|
||||
exec 3<file
|
||||
while read -u 3 line;do echo "$line"; read -p "Press any key" -n 1;done
|
||||
|
||||
Now the file descriptors look like:
|
||||
|
||||
--- +-----------------------+
|
||||
standard input ( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard output ( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
standard error ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-----------------------+
|
||||
|
||||
--- +-----------------------+
|
||||
new descriptor ( 3 ) ---->| file |
|
||||
--- +-----------------------+
|
||||
|
||||
and it works.
|
||||
|
||||
## Closing The File Descriptors
|
||||
|
||||
Closing a file through a file descriptor is easy, just make it a
|
||||
duplicate of -. For instance, let\'s close `stdin <&-` and
|
||||
`stderr 2>&-`:
|
||||
|
||||
bash -c '{ lsof -a -p $$ -d0,1,2 ;} <&- 2>&-'
|
||||
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
|
||||
bash 10668 pgas 1u CHR 136,2 4 /dev/pts/2
|
||||
|
||||
we see that inside the `{}` that only `1` is still here.
|
||||
|
||||
Though the OS will probably clean up the mess, it is perhaps a good idea
|
||||
to close the file descriptors you open. For instance, if you open a file
|
||||
descriptor with `exec 3>file`, all the commands afterwards will inherit
|
||||
it. It\'s probably better to do something like:
|
||||
|
||||
exec 3>file
|
||||
.....
|
||||
#commands that uses 3
|
||||
.....
|
||||
exec 3>&-
|
||||
|
||||
#we don't need 3 any more
|
||||
|
||||
I\'ve seen some people using this as a way to discard, say stderr, using
|
||||
something like: command 2\>&-. Though it might work, I\'m not sure if
|
||||
you can expect all applications to behave correctly with a closed
|
||||
stderr.
|
||||
|
||||
When in doubt, I use 2\>/dev/null.
|
||||
|
||||
# An Example
|
||||
|
||||
This example comes from [this post
|
||||
(ffe4c2e382034ed9)](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9)
|
||||
on the comp.unix.shell group:
|
||||
|
||||
{
|
||||
{
|
||||
cmd1 3>&- |
|
||||
cmd2 2>&3 3>&-
|
||||
} 2>&1 >&4 4>&- |
|
||||
cmd3 3>&- 4>&-
|
||||
|
||||
} 3>&2 4>&1
|
||||
|
||||
The redirections are processed from left to right, but as the file
|
||||
descriptors are inherited we will also have to work from the outer to
|
||||
the inner contexts. We will assume that we run this command in a
|
||||
terminal. Let\'s start with the outer `{ } 3>&2 4>&1`.
|
||||
|
||||
--- +-------------+ --- +-------------+
|
||||
( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+ --- +-------------+
|
||||
|
||||
--- +-------------+ --- +-------------+
|
||||
( 1 ) ---->| /dev/pts/5 | ( 4 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+ --- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
We only made 2 copies of `stderr` and `stdout`. `3>&1 4>&1` would have
|
||||
produced the same result here because we ran the command in a terminal
|
||||
and thus `1` and `2` go to the terminal. As an exercise, you can start
|
||||
with `1` pointing to `file.stdout` and 2 pointing to `file.stderr`, you
|
||||
will see why these redirections are very nice.
|
||||
|
||||
Let\'s continue with the right part of the second pipe:
|
||||
`| cmd3 3>&- 4>&-`
|
||||
|
||||
--- +-------------+
|
||||
( 0 ) ---->| 2nd pipe |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
It inherits the previous file descriptors, closes 3 and 4 and sets up a
|
||||
pipe for reading. Now for the left part of the second pipe
|
||||
`{...} 2>&1 >&4 4>&- |`
|
||||
|
||||
--- +-------------+ --- +-------------+
|
||||
( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+ --- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| 2nd pipe |
|
||||
--- +-------------+
|
||||
|
||||
First, The file descriptor `1` is connected to the pipe (`|`), then `2`
|
||||
is made a copy of `1` and thus is made an fd to the pipe (`2>&1`), then
|
||||
`1` is made a copy of `4` (`>&4`), then `4` is closed. These are the
|
||||
file descriptors of the inner `{}`. Lcet\'s go inside and have a look at
|
||||
the right part of the first pipe: `| cmd2 2>&3 3>&-`
|
||||
|
||||
--- +-------------+
|
||||
( 0 ) ---->| 1st pipe |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
It inherits the previous file descriptors, connects 0 to the 1st pipe,
|
||||
the file descriptor 2 is made a copy of 3, and 3 is closed. Finally, for
|
||||
the left part of the pipe:
|
||||
|
||||
--- +-------------+
|
||||
( 0 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 1 ) ---->| 1st pipe |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| 2nd pipe |
|
||||
--- +-------------+
|
||||
|
||||
It also inherits the file descriptor of the left part of the 2nd pipe,
|
||||
file descriptor `1` is connected to the first pipe, `3` is closed.
|
||||
|
||||
The purpose of all this becomes clear if we take only the commands:
|
||||
|
||||
cmd2
|
||||
|
||||
--- +-------------+
|
||||
-->( 0 ) ---->| 1st pipe |
|
||||
/ --- +-------------+
|
||||
/
|
||||
/ --- +-------------+
|
||||
cmd 1 / ( 1 ) ---->| /dev/pts/5 |
|
||||
/ --- +-------------+
|
||||
/
|
||||
--- +-------------+ / --- +-------------+
|
||||
( 0 ) ---->| /dev/pts/5 | / ( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+ / --- +-------------+
|
||||
/
|
||||
--- +-------------+ / cmd3
|
||||
( 1 ) ---->| 1st pipe | /
|
||||
--- +-------------+ --- +-------------+
|
||||
------------>( 0 ) ---->| 2nd pipe |
|
||||
--- +-------------+ / --- +-------------+
|
||||
( 2 ) ---->| 2nd pipe |/
|
||||
--- +-------------+ --- +-------------+
|
||||
( 1 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
--- +-------------+
|
||||
( 2 ) ---->| /dev/pts/5 |
|
||||
--- +-------------+
|
||||
|
||||
As said previously, as an exercise, you can start with `1` open on a
|
||||
file and `2` open on another file to see how the `stdin` from `cmd2` and
|
||||
`cmd3` goes to the original `stdin` and how the `stderr` goes to the
|
||||
original `stderr`.
|
||||
|
||||
# Syntax
|
||||
|
||||
I used to have trouble choosing between `0&<3` `3&>1` `3>&1` `->2`
|
||||
`-<&0` `&-<0` `0<&-` etc\... (I think probably because the syntax is
|
||||
more representative of the result, i.e., the redirection, than what is
|
||||
done, i.e., opening, closing, or duplicating file descriptors).
|
||||
|
||||
If this fits your situation, then maybe the following \"rules\" will
|
||||
help you, a redirection is always like the following:
|
||||
|
||||
lhs op rhs
|
||||
|
||||
- `lhs` is always a file description, i.e., a number:
|
||||
- Either we want to open, duplicate, move or we want to close. If
|
||||
the op is `<` then there is an implicit 0, if it\'s `>` or `>>`,
|
||||
there is an implicit 1.
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
- `op` is `<`, `>`, `>>`, `>|`, or `<>`:
|
||||
- `<` if the file decriptor in `lhs` will be read, `>` if it will
|
||||
be written, `>>` if data is to be appended to the file, `>|` to
|
||||
overwrite an existing file or `<>` if it will be both read and
|
||||
written.
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
- `rhs` is the thing that the file descriptor will describe:
|
||||
- It can be the name of a file, the place where another descriptor
|
||||
goes (`&1`), or, `&-`, which will close the file descriptor.
|
||||
|
||||
You might not like this description, and find it a bit incomplete or
|
||||
inexact, but I think it really helps to easily find that, say `&->0` is
|
||||
incorrect.
|
||||
|
||||
### A note on style
|
||||
|
||||
The shell is pretty loose about what it considers a valid redirect.
|
||||
While opinions probably differ, this author has some (strong)
|
||||
recommendations:
|
||||
|
||||
- **Always** keep redirections \"tightly grouped\" \-- that is, **do
|
||||
not** include whitespace anywhere within the redirection syntax
|
||||
except within quotes if required on the RHS (e.g. a filename that
|
||||
contains a space). Since shells fundamentally use whitespace to
|
||||
delimit fields in general, it is visually much clearer for each
|
||||
redirection to be separated by whitespace, but grouped in chunks
|
||||
that contain no unnecessary whitespace.
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
- **Do** always put a space between each redirection, and between the
|
||||
argument list and the first redirect.
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
- **Always** place redirections together at the very end of a command
|
||||
after all arguments. Never precede a command with a redirect. Never
|
||||
put a redirect in the middle of the arguments.
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
- **Never** use the Csh `&>foo` and `>&foo` shorthand redirects. Use
|
||||
the long form `>foo 2>&1`. (see: [obsolete](obsolete))
|
||||
|
||||
```{=html}
|
||||
<!-- -->
|
||||
```
|
||||
# Good! This is clearly a simple commmand with two arguments and 4 redirections
|
||||
cmd arg1 arg2 <myFile 3<&1 2>/dev/null >&2
|
||||
|
||||
# Good!
|
||||
{ cmd1 <<<'my input'; cmd2; } >someFile
|
||||
|
||||
# Bad. Is the "1" a file descriptor or an argument to cmd? (answer: it's the FD). Is the space after the herestring part of the input data? (answer: No).
|
||||
# The redirects are also not delimited in any obvious way.
|
||||
cmd 2>& 1 <<< stuff
|
||||
|
||||
# Hideously Bad. It's difficult to tell where the redirects are and whether they're even valid redirects.
|
||||
# This is in fact one command with one argument, an assignment, and three redirects.
|
||||
foo=bar<baz bork<<< blarg>bleh
|
||||
|
||||
# Conclusion
|
||||
|
||||
I hope this tutorial worked for you.
|
||||
|
||||
I lied, I did not explain `1>&3-`, go check the manual ;-)
|
||||
|
||||
Thanks to Stéphane Chazelas from whom I stole both the intro and the
|
||||
example\....
|
||||
|
||||
The intro is inspired by this introduction, you\'ll find a nice exercise
|
||||
there too:
|
||||
|
||||
- [A Detailed Introduction to I/O and I/O
|
||||
Redirection](http://tldp.org/LDP/abs/html/ioredirintro.html)
|
||||
|
||||
The last example comes from this post:
|
||||
|
||||
- [comp.unix.shell: piping stdout and stderr to different
|
||||
processes](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9)
|
||||
|
||||
# See also
|
||||
|
||||
- Internal: [Redirection syntax overview](/syntax/redirection)
|
94
docs/howto/testing-your-scripts.md
Normal file
94
docs/howto/testing-your-scripts.md
Normal file
@ -0,0 +1,94 @@
|
||||
The one of the simplest way to check your bash/sh scripts is run it and
|
||||
check it output or run it and check the result. This tutorial shows
|
||||
how-to use [bashtest](https://github.com/pahaz/bashtest) tool for
|
||||
testing your scripts.
|
||||
|
||||
### Write simple util
|
||||
|
||||
We have a simple **stat.sh** script:
|
||||
|
||||
#!/usr/bin/env bash
|
||||
|
||||
if [ -z "$1" ]
|
||||
then
|
||||
DIR=./
|
||||
else
|
||||
DIR=$1
|
||||
fi
|
||||
|
||||
echo "Evaluate *.py statistics"
|
||||
FILES=$(find $DIR -name '*.py' | wc -l)
|
||||
LINES=$((find $DIR -name '*.py' -print0 | xargs -0 cat) | wc -l)
|
||||
echo "PYTHON FILES: $FILES"
|
||||
echo "PYTHON LINES: $LINES"
|
||||
|
||||
This script evaluate the number of python files and the number of python
|
||||
code lines in the files. We can use it like **./stat.sh \<dir\>**
|
||||
|
||||
### Create testsuit
|
||||
|
||||
Then make test suits for **stat.sh**. We make a directory **testsuit**
|
||||
which contain test python files.
|
||||
|
||||
**testsuit/main.py**
|
||||
|
||||
import foo
|
||||
print(foo)
|
||||
|
||||
**testsuit/foo.py**
|
||||
|
||||
BAR = 1
|
||||
BUZ = BAR + 2
|
||||
|
||||
Ok! Our test suit is ready! We have 2 python files which contains 4
|
||||
lines of code.
|
||||
|
||||
### Write bashtests
|
||||
|
||||
Lets write tests. We just write a shell command for testing our work.
|
||||
|
||||
Create file **tests.bashtest**:
|
||||
|
||||
$ ./stat.sh testsuit/
|
||||
Evaluate *.py statistics
|
||||
PYTHON FILES: 2
|
||||
PYTHON LINES: 4
|
||||
|
||||
This is our test! This is simple. Try to run it.
|
||||
|
||||
# install bashtest if required!
|
||||
$ pip install bashtest
|
||||
|
||||
# run tests
|
||||
$ bashtest *.bashtest
|
||||
1 items passed all tests:
|
||||
1 tests in tests.bashtest
|
||||
1 tests in 1 items.
|
||||
1 passed and 0 failed.
|
||||
Test passed.
|
||||
|
||||
Thats all. We wrote one test. You can write more tests if you want.
|
||||
|
||||
$ ls testsuit/
|
||||
foo.py main.py
|
||||
|
||||
$ ./stat.sh testsuit/
|
||||
Evaluate *.py statistics
|
||||
PYTHON FILES: 2
|
||||
PYTHON LINES: 4
|
||||
|
||||
And run tests again:
|
||||
|
||||
$ bashtest *.bashtest
|
||||
1 items passed all tests:
|
||||
2 tests in tests.bashtest
|
||||
2 tests in 1 items.
|
||||
2 passed and 0 failed.
|
||||
Test passed.
|
||||
|
||||
You can find more **.bashtest** examples in the [bashtest github
|
||||
repo](https://github.com/pahaz/bashtest). You can also write your
|
||||
question or report a bug
|
||||
[here](https://github.com/pahaz/bashtest/issues).
|
||||
|
||||
Happy testing!
|
Loading…
Reference in New Issue
Block a user