From 2d52763c7554e0914dd70bce8db7edeedd8bdf22 Mon Sep 17 00:00:00 2001 From: flokoe Date: Wed, 5 Jul 2023 11:10:03 +0200 Subject: [PATCH] Convert howto pages to Markdown --- docs/howto/calculate-dc.md | 281 ++++++++++++ docs/howto/collapsing_functions.md | 98 ++++ docs/howto/conffile.md | 109 +++++ docs/howto/dissectabadoneliner.md | 125 ++++++ docs/howto/edit-ed.md | 389 ++++++++++++++++ docs/howto/getopts_tutorial.md | 359 +++++++++++++++ docs/howto/mutex.md | 207 +++++++++ docs/howto/pax.md | 353 +++++++++++++++ docs/howto/redirection_tutorial.md | 697 +++++++++++++++++++++++++++++ docs/howto/testing-your-scripts.md | 94 ++++ 10 files changed, 2712 insertions(+) create mode 100644 docs/howto/calculate-dc.md create mode 100644 docs/howto/collapsing_functions.md create mode 100644 docs/howto/conffile.md create mode 100644 docs/howto/dissectabadoneliner.md create mode 100644 docs/howto/edit-ed.md create mode 100644 docs/howto/getopts_tutorial.md create mode 100644 docs/howto/mutex.md create mode 100644 docs/howto/pax.md create mode 100644 docs/howto/redirection_tutorial.md create mode 100644 docs/howto/testing-your-scripts.md diff --git a/docs/howto/calculate-dc.md b/docs/howto/calculate-dc.md new file mode 100644 index 0000000..29b85e3 --- /dev/null +++ b/docs/howto/calculate-dc.md @@ -0,0 +1,281 @@ +# Calculating with dc + +![](keywords>bash shell scripting arithmetic calculate) + +## Introduction + +dc(1) is a non standard, but commonly found, reverse-polish Desk +Calculator. According to Ken Thompson, \"dc is the oldest language on +Unix; it was written on the PDP-7 and ported to the PDP-11 before Unix +\[itself\] was ported\". + +Historically the standard bc(1) has been implemented as a *front-end to +dc*. + +## Simple calculation + +In brief, the *reverse polish notation* means the numbers are put on the +stack first, then an operation is applied to them. Instead of writing +`1+1`, you write `1 1+`. + +By default `dc`, unlike `bc`, doesn\'t print anything, the result is +pushed on the stack. You have to use the \"p\" command to print the +element at the top of the stack. Thus a simple operation looks like: + + $ dc <<< '1 1+pq' + 2 + +I used a \"here string\" present in bash 3.x, ksh93 and zsh. if your +shell doesn\'t support this, you can use `echo '1 1+p' | dc` or if you +have GNU `dc`, you can use `dc -e '1 1 +p`\'. + +Of course, you can also just run `dc` and enter the commands. + +The classic operations are: + +- addition: `+` +- subtraction: `-` +- division: `/` +- multiplication: `*` +- remainder (modulo): `%` +- exponentiation: `^` +- square root: `v` + +GNU `dc` adds a couple more. + +To input a negative number you need to use the `_` (underscore) +character: + + $ dc <<< '1_1-p' + 2 + +You can use the *digits* `0` to `9` and the *letters* `A` to `F` as +numbers, and a dot (`.`) as a decimal point. The `A` to `F` **must** be +capital letters in order not to be confused with the commands specified +with lower case characters. A number with a letter is considered +hexadecimal: + + dc <<< 'Ap' + 10 + +The **output** is converted to **base 10** by default + +## Scale And Base + +`dc` is a calulator with abitrary precision, by default this precision +is 0. thus `dc <<< "5 4/p"` prints \"1\". + +We can increase the precision using the `k` command. It pops the value +at the top of the stack and uses it as the precision argument: + + dc <<< '2k5 4/p' # prints 1.25 + dc <<< '4k5 4/p' # prints 1.2500 + dc <<< '100k 2vp' + 1.4142135623730950488016887242096980785696718753769480731766797379907\ + 324784621070388503875343276415727 + +dc supports *large* precision arguments. + +You can change the base used to output (*print*) the numbers with `o` +and the base used to input (*type*) the numbers with `i`: + + dc << EOF + 20 p# prints 20, output is in base 10 + 16o # the output is now in base 2 16 + 20p # prints 14, in hex + 16i # the output is now in hex + p # prints 14 this doesn't modify the number in the stack + 10p # prints 10 the output is done in base 16 + EOF + +Note: when the input value is modified, the base is modified for all +commands, including `i`: + + dc << EOF + 16i 16o # base is 16 for input and output + 10p # prints 10 + 10i # ! set the base to 10 i.e. to 16 decimal + 17p # prints 17 + EOF + +This code prints 17 while we might think that `10i` reverts the base +back to 10 and thus the number should be converted to hex and printed as +11. The problem is 10 was typed while the input base 16, thus the base +was set to 10 hexadecimal, i.e. 16 decimal. + + dc << EOF + 16o16o10p #prints 10 + Ai # set the base to A in hex i.e. 10 + 17p # prints 11 in base 16 + EOF + +## Stack + +There are two basic commands to manipulate the stack: + +- `d` duplicates the top of the stack +- `c` clears the stack + +```{=html} + +``` + $ dc << EOF + 2 # put 2 on the stack + d # duplicate i.e. put another 2 on the stack + *p # multiply and print + c p # clear and print + EOF + 4 + dc: stack empty + +`c p` results in an error, as we would expect, as c removes everything +on the stack. *Note: we can use `#` to put comments in the script.* + +If you are lost, you can inspect (i.e. print) the stack using the +command `f`. The stack remains unchanged: + + dc <<< '1 2 d 4+f' + 6 + 2 + 1 + +Note how the first element that will be popped from the stack is printed +first, if you are used to an HP calculator, it\'s the reverse. + +Don\'t hesitate to put `f` in the examples of this tutorial, it doesn\'t +change the result, and it\'s a good way to see what\'s going on. + +## Registers + +The GNU `dc` manual says that dc has at least **256 registers** +depending on the range of unsigned char. I\'m not sure how you are +supposed to use the NUL byte. Using a register is easy: + + dc <a` will execute the +macro stored in the register `a`, if the top of the stack is *greater +than* the second element of the stack. Note: the top of the stack +contains the last entry. When written, it appears as the reverse of what +we are used to reading: + + dc << EOF + [[Hello World]p] sR # store in 'R' a macro that prints Hello World + 2 1 >R # do nothing 1 is at the top 2 is the second element + 1 2 >R # prints Hello World + EOF + +Some `dc` have `>R R 1 2 >R f"` doesn\'t print anything) + +Have you noticed how we can *include* a macro (string) in a macro? and +as `dc` relies on a stack we can, in fact, use the macro recursively +(have your favorite control-c key combo ready ;)) : + + dc << EOF + [ [Hello World] p # our macro starts by printing Hello World + lRx ] # and then executes the macro in R + sR # we store it in the register R + lRx # and finally executes it. + EOF + +We have recursivity, we have test, we have loops: + + dc << EOF + [ li # put our index i on the stack + p # print it, to see what's going on + 1 - # we decrement the index by one + si # store decremented index (i=i-1) + 0 li >L # if i > 0 then execute L + ] sL # store our macro with the name L + + 10 si # let's give to our index the value 10 + lLx # and start our loop + EOF + +Of course code written this way is far too easy to read! Make sure to +remove all those extra spaces newlines and comments: + + dc <<< '[lip1-si0li>L]sL10silLx' + dc <<< '[p1-d0 + +And more example, as well as a dc implementation in python here: + +- +- + +The manual for the 1971 dc from Bell Labs: + +- (dead link) diff --git a/docs/howto/collapsing_functions.md b/docs/howto/collapsing_functions.md new file mode 100644 index 0000000..7c7fd89 --- /dev/null +++ b/docs/howto/collapsing_functions.md @@ -0,0 +1,98 @@ +# Collapsing Functions + +![](keywords>bash shell scripting example function collapse) + +## What is a \"Collapsing Function\"? + +A collapsing function is a function whose behavior changes depending +upon the circumstances under which it\'s run. Function collapsing is +useful when you find yourself repeatedly checking a variable whose value +never changes. + +## How do I make a function collapse? + +Function collapsing requires some static feature in the environment. A +common example is a script that gives the user the option of having +\"verbose\" output. + + #!/bin/bash + + [[ $1 = -v || $1 = --verbose ]] && verbose=1 + + chatter() { + if [[ $verbose ]]; then + chatter() { + echo "$@" + } + chatter "$@" + else + chatter() { + : + } + fi + } + + echo "Waiting for 10 seconds." + for i in {1..10}; do + chatter "$i" + sleep 1 + done + +## How does it work? + +The first time you run chatter(), the function redefines itself based on +the value of verbose. Thereafter, chatter doesn\'t check \$verbose, it +simply is. Further calls to the function reflect its collapsed nature. +If verbose is unset, chatter will echo nothing, with no extra effort +from the developer. + +## More examples + +FIXME Add more examples! + + # Somewhat more portable find -executable + # FIXME/UNTESTED (I don't have access to all of the different versions of find.) + # Usage: find PATH ARGS -- use find like normal, except use -executable instead of + # various versions of -perm /+ blah blah and hacks + find() { + hash find || { echo 'find not found!'; exit 1; } + # We can be pretty sure "$0" should be executable. + if [[ $(command find "$0" -executable 2> /dev/null) ]]; then + unset -f find # We can just use the command find + elif [[ $(command find "$0" -perm /u+x 2> /dev/null) ]]; then + find() { + typeset arg args + for arg do + [[ $arg = -executable ]] && args+=(-perm /u+x) || args+=("$arg") + done + command find "${args[@]}" + } + elif [[ $(command find "$0" -perm +u+x 2> /dev/null) ]]; then + find() { + typeset arg args + for arg do + [[ $arg = -executable ]] && args+=(-perm +u+x) || args+=("$arg") + done + command find "${args[@]}" + } + else # Last resort + find() { + typeset arg args + for arg do + [[ $arg = -executable ]] && args+=(-exec test -x {} \; -print) || args+=("$arg") + done + command find "${args[@]}" + } + fi + find "$@" + } + + #!/bin/bash + # Using collapsing functions to turn debug messages on/off + + [ "--debug" = "$1" ] && dbg=echo || dbg=: + + + # From now on if you use $dbg instead of echo, you can select if messages will be shown + + $dbg "This message will only be displayed if --debug is specified at the command line diff --git a/docs/howto/conffile.md b/docs/howto/conffile.md new file mode 100644 index 0000000..2371752 --- /dev/null +++ b/docs/howto/conffile.md @@ -0,0 +1,109 @@ +# Config files for your script + +![](keywords>bash shell scripting config files include configuration) + +## General + +For this task, you don\'t have to write large parser routines (unless +you want it 100% secure or you want a special file syntax) - you can use +the Bash source command. The file to be sourced should be formated in +key=\"value\" format, otherwise bash will try to interpret commands: + + #!/bin/bash + echo "Reading config...." >&2 + source /etc/cool.cfg + echo "Config for the username: $cool_username" >&2 + echo "Config for the target host: $cool_host" >&2 + +So, where do these variables come from? If everything works fine, they +are defined in /etc/cool.cfg which is a file that\'s sourced into the +current script or shell. Note: this is **not** the same as executing +this file as a script! The sourced file most likely contains something +like: + + cool_username="guest" + cool_host="foo.example.com" + +These are normal statements understood by Bash, nothing special. Of +course (and, a big disadvantage under normal circumstances) the sourced +file can contain **everything** that Bash understands, including +malicious code! + +The `source` command also is available under the name `.` (dot). The +usage of the dot is identical: + + #!/bin/bash + echo "Reading config...." >&2 + . /etc/cool.cfg #note the space between the dot and the leading slash of /etc.cfg + echo "Config for the username: $cool_username" >&2 + echo "Config for the target host: $cool_host" >&2 + +## Per-user configs + +There\'s also a way to provide a system-wide config file in /etc and a +custom config in \~/(user\'s home) to override system-wide defaults. In +the following example, the if/then construct is used to check for the +existance of a user-specific config: + + #!/bin/bash + echo "Reading system-wide config...." >&2 + . /etc/cool.cfg + if [ -r ~/.coolrc ]; then + echo "Reading user config...." >&2 + . ~/.coolrc + fi + +## Secure it + +As mentioned earlier, the sourced file can contain anything a Bash +script can. Essentially, it **is** an included Bash script. That creates +security issues. A malicicios person can \"execute\" arbitrary code when +your script is sourcing its config file. You might want to allow only +constructs in the form `NAME=VALUE` in that file (variable assignment +syntax) and maybe comments (though technically, comments are +unimportant). Imagine the following \"config file\", containing some +malicious code: + + # cool config file for my even cooler script + username=god_only_knows + hostname=www.example.com + password=secret ; echo rm -rf ~/* + parameter=foobar && echo "You've bene pwned!"; + # hey look, weird code follows... + echo "I am the skull virus..." + echo rm -fr ~/* + mailto=netadmin@example.com + +You don\'t want these `echo`-commands (which could be any other +commands!) to be executed. One way to be a bit safer is to filter only +the constructs you want, write the filtered results to a new file and +source the new file. We also need to be sure something nefarious hasn\'t +been added to the end of one of our name=value parameters, perhaps using +; or && command separators. In those cases, perhaps it is simplest to +just ignore the line entirely. Egrep (`grep -E`) will help us here, it +filters by description: + + #!/bin/bash + configfile='/etc/cool.cfg' + configfile_secured='/tmp/cool.cfg' + + # check if the file contains something we don't want + if egrep -q -v '^#|^[^ ]*=[^;]*' "$configfile"; then + echo "Config file is unclean, cleaning it..." >&2 + # filter the original to a new file + egrep '^#|^[^ ]*=[^;&]*' "$configfile" > "$configfile_secured" + configfile="$configfile_secured" + fi + + # now source it, either the original or the filtered variant + source "$configfile" + +**[To make clear what it does:]{.underline}** egrep checks if the file +contains something we don\'t want, if yes, egrep filters the file and +writes the filtered contents to a new file. If done, the original file +name is changed to the name stored in the variable `configfile`. The +file named by that variable is sourced, as if it were the original file. + +This filter allows only `NAME=VALUE` and comments in the file, but it +doesn\'t prevent all methods of code execution. I will address that +later. diff --git a/docs/howto/dissectabadoneliner.md b/docs/howto/dissectabadoneliner.md new file mode 100644 index 0000000..f5cce7a --- /dev/null +++ b/docs/howto/dissectabadoneliner.md @@ -0,0 +1,125 @@ +# Dissect a bad oneliner + +``` bash +$ ls *.zip | while read i; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done +``` + +This is an actual one-liner someone asked about in `#bash`. **There are +several things wrong with it. Let\'s break it down!** + +``` bash +$ ls *.zip | while read i; do ...; done +``` + +(Please read .) This command +executes `ls` on the expansion of `*.zip`. Assuming there are filenames +in the current directory that end in \'.zip\', ls will give a +human-readable list of those names. The output of ls is not for parsing. +But in sh and bash alike, we can loop safely over the glob itself: + +``` bash +$ for i in *.zip; do j=`echo $i | sed 's/.zip//g'`; mkdir $j; cd $j; unzip ../$i; cd ..; done +``` + +Let\'s break it down some more! + +``` bash +j=`echo $i | sed 's/.zip//g'` # where $i is some name ending in '.zip' +``` + +The goal here seems to be get the filename without its `.zip` extension. +In fact, there is a POSIX(r)-compliant command to do this: `basename` +The implementation here is suboptimal in several ways, but the only +thing that\'s genuinely error-prone with this is \"`echo $i`\". Echoing +an *unquoted* variable means +[wordsplitting](/syntax/expansion/wordsplit) will take place, so any +whitespace in `$i` will essentially be normalized. In `sh` it is +necessary to use an external command and a subshell to achieve the goal, +but we can eliminate the pipe (subshells, external commands, and pipes +carry extra overhead when they launch, so they can really hurt +performance in a loop). Just for good measure, let\'s use the more +readable, [modern](/syntax/expansion/cmdsubst) `$()` construct instead +of the old style backticks: + +``` bash +sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir $j; cd $j; unzip ../$i; cd ..; done +``` + +In Bash we don\'t need the subshell or the external basename command. +See [Substring removal with parameter +expansion](/syntax/pe#substring_removal): + +``` bash +bash $ for i in *.zip; do j="${i%.zip}"; mkdir $j; cd $j; unzip ../$i; cd ..; done +``` + +Let\'s keep going: + +``` bash +$ mkdir $j; cd $j; ...; cd .. +``` + +As a programmer, you **never** know the situation under which your +program will run. Even if you do, the following best practice will never +hurt: When a following command depends on the success of a previous +command(s), check for success! You can do this with the \"`&&`\" +conjunction, that way, if the previous command fails, bash will not try +to execute the following command(s). It\'s fully POSIX(r). Oh, and +remember what I said about [wordsplitting](/syntax/expansion/wordsplit) +in the previous step? Well, if you don\'t quote `$j`, wordsplitting can +happen again. + +``` bash +$ mkdir "$j" && cd "$j" && ... && cd .. +``` + +That\'s almost right, but there\'s one problem \-- what happens if `$j` +contains a slash? Then `cd ..` will not return to the original +directory. That\'s wrong! `cd -` causes cd to return to the previous +working directory, so it\'s a much better choice: + +``` bash +$ mkdir "$j" && cd "$j" && ... && cd - +``` + +(If it occurred to you that I forgot to check for success after cd -, +good job! You could do this with `{ cd - || break; }`, but I\'m going to +leave that out because it\'s verbose and I think it\'s likely that we +will be able to get back to our original working directory without a +problem.) + +So now we have: + +``` bash +sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && cd "$j" && unzip ../$i && cd -; done +``` + +``` bash +bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && cd "$j" && unzip ../$i && cd -; done +``` + +Let\'s throw the `unzip` command back in the mix: + +``` bash +mkdir "$j" && cd "$j" && unzip ../$i && cd - +``` + +Well, besides word splitting, there\'s nothing terribly wrong with this. +Still, did it occur to you that unzip might already be able to target a +directory? There isn\'t a standard for the `unzip` command, but all the +implementations I\'ve seen can do it with the -d flag. So we can drop +the cd commands entirely: + +``` bash +$ mkdir "$j" && unzip -d "$j" "$i" +``` + +``` bash +sh $ for i in *.zip; do j=$(basename "$i" ".zip"); mkdir "$j" && unzip -d "$j" "$i"; done +``` + +``` bash +bash $ for i in *.zip; do j="${i%.zip}"; mkdir "$j" && unzip -d "$j" "$i"; done +``` + +There! That\'s as good as it gets. diff --git a/docs/howto/edit-ed.md b/docs/howto/edit-ed.md new file mode 100644 index 0000000..fac0f4c --- /dev/null +++ b/docs/howto/edit-ed.md @@ -0,0 +1,389 @@ +# Editing files via scripts with ed + +![](keywords>bash shell scripting arguments file editor edit ed sed) + +## Why ed? + +Like `sed`, `ed` is a line editor. However, if you try to change file +contents with `sed`, and the file is open elsewhere and read by some +process, you will find out that GNU `sed` and its `-i` option will not +allow you to edit the file. There are circumstances where you may need +that, e.g. editing active and open files, the lack of GNU, or other +`sed`, with \"in-place\" option available. + +Why `ed`? + +- maybe your `sed` doesn\'t support in-place edit +- maybe you need to be as portable as possible +- maybe you need to really edit in-file (and not create a new file + like GNU `sed`) +- last but not least: standard `ed` has very good editing and + addressing possibilities, compared to standard `sed` + +Don\'t get me wrong, this is **not** meant as anti-`sed` article! It\'s +just meant to show you another way to do the job. + +## Commanding ed + +Since `ed` is an interactive text editor, it reads and executes commands +that come from `stdin`. There are several ways to feed our commands to +ed: + +**[Pipelines]{.underline}** + + echo '' | ed + +To inject the needed newlines, etc. it may be easier to use the builtin +command, `printf` (\"help printf\"). Shown here as an example Bash +function to prefix text to file content: + + + # insertHead "$text" "$file" + + insertHead() { + printf '%s\n' H 1i "$1" . w | ed -s "$2" + } + +**[Here-strings]{.underline}** + + ed <<< '' + +**[Here-documents]{.underline}** + + ed < + EOF + +Which one you prefer is your choice. I will use the here-strings, since +it looks best here IMHO. + +There are other ways to provide input to `ed`. For example, process +substitution. But these should be enough for daily needs. + +Since `ed` wants commands separated by newlines, I\'ll use a special +Bash quoting method, the C-like strings `$'TEXT'`, as it can interpret a +set of various escape sequences and special characters. I\'ll use the +`-s` option to make it less verbose. + +## The basic interface + +Check the `ed` manpage for details + +Similar to `vi` or `vim`, `ed` has a \"command mode\" and an +\"interactive mode\". For non-interactive use, the command mode is the +usual choice. + +Commands to `ed` have a simple and regular structure: zero, one, or two +addresses followed by a single-character command, possibly followed by +parameters to that command. These addresses specify one or more lines in +the text buffer. Every command that requires addresses has default +addresses, so the addresses can often be omitted. + +The line addressing is relative to the *current line*. If the edit +buffer is not empty, the initial value for the *current line* shall be +the last line in the edit buffer, otherwise zero. Generally, the +*current line* is the last line affected by a command. All addresses can +only address single lines, not blocks of lines! + +Line addresses or commands using *regular expressions* interpret POSIX +Basic Regular Expressions (BRE). A null BRE is used to reference the +most recently used BRE. Since `ed` addressing is only for single lines, +no RE can ever match a newline. + +## Debugging your ed scripts + +By default, `ed` is not very talkative and will simply print a \"?\" +when an error occurs. Interactively you can use the `h` command to get a +short message explaining the last error. You can also turn on a mode +that makes `ed` automatically print this message with the `H` command. +It is a good idea to always add this command at the beginning of your ed +scripts: + + bash > ed -s file <<< $'H\n,df' + ? + script, line 2: Invalid command suffix + +While working on your script, you might make errors and destroy your +file, you might be tempted to try your script doing something like: + + # Works, but there is better + + # copy my original file + cp file file.test + + # try my script on the file + ed -s file.test <<< $'H\n\nw' + + # see the results + cat file.test + +There is a much better way though, you can use the ed command `p` to +print the file, now your testing would look like: + + ed -s file <<< $'H\n\n,p' + +the `,` (comma) in front of the `p` command is a shortcut for `1,$` +which defines an address range for the first to the last line, `,p` thus +means print the whole file, after it has been modified. When your script +runs sucessfully, you only have to replace the `,p` by a `w`. + +Of course, even if the file is not modified by the `p` command, **it\'s +always a good idea to have a backup copy!** + +## Editing your files + +Most of these things can be done with `sed`. But there are also things +that can\'t be done in `sed` or can only be done with very complex code. + +### Simple word substitutions + +Like `sed`, `ed` also knows the common `s/FROM/TO/` command, and it can +also take line-addresses. **If no substitution is made on the addressed +lines, it\'s considered an error.** + +#### Substitutions through the whole file + + ed -s test.txt <<< $',s/Windows(R)-compatible/POSIX-conform/g\nw' + +[Note:]{.underline} The comma as single address operator is an alias for +`1,$` (\"all lines\"). + +#### Substitutions in specific lines + +On a line containing `fruits`, do the substitution: + + ed -s test.txt <<< $'/fruits/s/apple/banana/g\nw' + +On the 5th line after the line containing `fruits`, do the substitution: + + ed -s test.txt <<< $'/fruits/+5s/apple/banana/g\nw' + +### Block operations + +#### Delete a block of text + +The simple one is a well-known (by position) block of text: + + # delete lines number 2 to 4 (2, 3, 4) + ed -s test.txt <<< $'2,5d\nw' + +This deletes all lines matching a specific regular expression: + + # delete all lines matching foobar + ed -s test.txt <<< $'g/foobar/d\nw' + +g/regexp/ applies the command following it to all the lines matching the +regexp + +#### Move a block of text + +\...using the `m` command: `
m ` + +This is definitely something that can\'t be done easily with sed. + + # moving lines 5-9 to the end of the file + ed -s test.txt <<< $'5,9m$\nw' + + # moving lines 5-9 to line 3 + ed -s test.txt <<< $'5,9m3\nw' + +#### Copy a block of text + +\...using the `t` command: `
t ` + +You use the `t` command just like you use the `m` (move) command. + + # make a copy of lines 5-9 and place it at the end of the file + ed -s test.txt <<< $'5,9t$\nw' + + # make a copy of lines 5-9 and place it at line 3 + ed -s test.txt <<< $'5,9t3\nw' + +#### Join all lines + +\...but leave the final newline intact. This is done by an extra +command: `j` (join). + + ed -s file <<< $'1,$j\nw' + +Compared with two other methods (using `tr` or `sed`), you don\'t have +to delete all newlines and manually add one at the end. + +### File operations + +#### Insert another file + +How do you insert another file? As with `sed`, you use the `r` (read) +command. That inserts another file at the line before the last line (and +prints the result to stdout - `,p`): + + ed -s FILE1 <<< $'$-1 r FILE2\n,p' + +To compare, here\'s a possible `sed` solution which must use Bash +arithmetic and the external program `wc`: + + sed "$(($(wc -l < FILE1)-1))r FILE2" FILE1 + + # UPDATE here's one which uses GNU sed's "e" parameter for the s-command + # it executes the commands found in pattern space. I'll take that as a + # security risk, but well, sometimes GNU > security, you know... + sed '${h;s/.*/cat FILE2/e;G}' FILE1 + +Another approach, in two invocations of sed, that avoids the use of +external commands completely: + + sed $'${s/$/\\n-||-/;r FILE2\n}' FILE1 | sed '0,/-||-/{//!h;N;//D};$G' + +## Pitfalls + +### ed is not sed + +ed and sed might look similar, but the same command(s) might act +differently: + +**\_\_ /foo/d \_\_** + +In sed /foo/d will delete all lines matching foo, in ed the commands are +not repeated on each line so this command will search the next line +matching foo and delete it. If you want to delete all lines matching +foo, or do a subsitution on all lines matching foo you have to tell ed +about it with the g (global) command: + + echo $'1\n1\n3' > file + + #replace all lines matching 1 by "replacement" + ed -s file <<< $'g/1/s/1/replacement/\n,p' + + #replace the first line matching 1 by "replacement" + #(because it starts searching from the last line) + ed -s file <<< $'s/1/replacement/\n,p' + +**\_\_ an error stops the script \_\_** + +You might think that it\'s not a problem and that the same thing happens +with sed and you\'re right, with the exception that if ed does not find +a pattern it\'s an error, while sed just continues with the next line. +For instance, let\'s say that you want to change foo to bar on the first +line of the file and add something after the next line, ed will stop if +it cannot find foo on the first line, sed will continue. + + #Gnu sed version + sed -e '1s/foo/bar/' -e '$a\something' file + + #First ed version, does nothing if foo is not found on the first line: + ed -s file <<< $'H\n1s/foo/bar/\na\nsomething\n.\nw' + +If you want the same behaviour you can use g/foo/ to trick ed. g/foo/ +will apply the command on all lines matching foo, thus the substitution +will succeed and ed will not produce an error when foo is not found: + + #Second version will add the line with "something" even if foo is not found + ed -s file <<< $'H\n1g/foo/s/foo/bar/\na\nsomething\n.\nw' + +In fact, even a substitution that fails after a g/ / command does not +seem to cause an error, i.e. you can use a trick like g/./s/foo/bar/ to +attempt the substitution on all non blank lines + +### here documents + +**\_\_ shell parameters are expanded \_\_** + +If you don\'t quote the delimiter, \$ has a special meaning. This sounds +obvious but it\'s easy to forget this fact when you use addresses like +\$-1 or commands like \$a. Either quote the \$ or the delimiter: + + #fails + ed -s file << EOF + $a + last line + . + w + EOF + + #ok + ed -s file << EOF + \$a + last line + . + w + EOF + + #ok again + ed -s file << 'EOF' + $a + last line + . + w + EOF + +**\_\_ \".\" is not a command \_\_** + +The . used to terminate the command \"a\" must be the only thing on the +line. take care if you indent the commands: + + #ed doesn't care about the spaces before the commands, but the . must be the only thing on the line: + ed -s file << EOF + a + my content + . + w + EOF + +## Simulate other commands + +Keep in mind that in all the examples below, the entire file will be +read into memory. + +### A simple grep + + ed -s file <<< 'g/foo/p' + + # equivalent + ed -s file <<< 'g/foo/' + +The name `grep` is derived from the notaion `g/RE/p` (global =\> regular +expression =\> print). ref + + +### wc -l + +Since the default for the `ed` \"print line number\" command is the last +line, a simple `=` (equal sign) will print this line number and thus the +number of lines of the file: + + ed -s file <<< '=' + +### cat + +Yea, it\'s a joke\... + + ed -s file <<< $',p' + +\...but a similar thing to `cat` showing line-endings and escapes can be +done with the `list` command (l): + + ed -s file <<< $',l' + +FIXME to be continued + +## Links + +Reference: + +- [Gnu ed](http://www.gnu.org/software/ed/manual/ed_manual.html) - if + we had to guess, you\'re probably using this one. +- POSIX + [ed](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#tag_20_38), + [ex](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ex.html#tag_20_40), + and + [vi](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/vi.html#tag_20_152) +- - ed cheatsheet on + sdf.org + +Misc info / tutorials: + +- [How can I replace a string with another string in a variable, a + stream, a file, or in all the files in a + directory?](http://mywiki.wooledge.org/BashFAQ/021) - BashFAQ +- - + Old but still relevant ed tutorial. diff --git a/docs/howto/getopts_tutorial.md b/docs/howto/getopts_tutorial.md new file mode 100644 index 0000000..73c5e20 --- /dev/null +++ b/docs/howto/getopts_tutorial.md @@ -0,0 +1,359 @@ +# Small getopts tutorial + +![](keywords>bash shell scripting arguments positional parameters options getopt getopts) + +## Description + +**Note that** `getopts` is neither able to parse GNU-style long options +(`--myoption`) nor XF86-style long options (`-myoption`). So, when you +want to parse command line arguments in a professional ;-) way, +`getopts` may or may not work for you. Unlike its older brother `getopt` +(note the missing *s*!), it\'s a shell builtin command. The advantages +are: + +- No need to pass the positional parameters through to an external + program. +- Being a builtin, `getopts` can set shell variables to use for + parsing (impossible for an *external* process!) +- There\'s no need to argue with several `getopt` implementations + which had buggy concepts in the past (whitespace, \...) +- `getopts` is defined in POSIX(r). + +------------------------------------------------------------------------ + +Some other methods to parse positional parameters - using neither +**getopt** nor **getopts** - are described in: [How to handle positional +parameters](/scripting/posparams). + +### Terminology + +It\'s useful to know what we\'re talking about here, so let\'s see\... +Consider the following command line: + + mybackup -x -f /etc/mybackup.conf -r ./foo.txt ./bar.txt + +These are all positional parameters, but they can be divided into +several logical groups: + +- `-x` is an **option** (aka **flag** or **switch**). It consists of a + dash (`-`) followed by **one** character. +- `-f` is also an option, but this option has an associated **option + argument** (an argument to the option `-f`): `/etc/mybackup.conf`. + The option argument is usually the argument following the option + itself, but that isn\'t mandatory. Joining the option and option + argument into a single argument `-f/etc/mybackup.conf` is valid. +- `-r` depends on the configuration. In this example, `-r` doesn\'t + take arguments so it\'s a standalone option like `-x`. +- `./foo.txt` and `./bar.txt` are remaining arguments without any + associated options. These are often used as **mass-arguments**. For + example, the filenames specified for `cp(1)`, or arguments that + don\'t need an option to be recognized because of the intended + behavior of the program. POSIX(r) calls them **operands**. + +To give you an idea about why `getopts` is useful, The above command +line is equivalent to: + + mybackup -xrf /etc/mybackup.conf ./foo.txt ./bar.txt + +which is complex to parse without the help of `getopts`. + +The option flags can be **upper- and lowercase** characters, or +**digits**. It may recognize other characters, but that\'s not +recommended (usability and maybe problems with special characters). + +### How it works + +In general you need to call `getopts` several times. Each time it will +use the next positional parameter and a possible argument, if parsable, +and provide it to you. `getopts` will not change the set of positional +parameters. If you want to shift them, it must be done manually: + + shift $((OPTIND-1)) + # now do something with $@ + +Since `getopts` sets an exit status of *FALSE* when there\'s nothing +left to parse, it\'s easy to use in a while-loop: + + while getopts ...; do + ... + done + +`getopts` will parse options and their possible arguments. It will stop +parsing on the first non-option argument (a string that doesn\'t begin +with a hyphen (`-`) that isn\'t an argument for any option in front of +it). It will also stop parsing when it sees the `--` (double-hyphen), +which means [end of options](/dict/terms/end_of_options). + +### Used variables + + variable description + ------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + [OPTIND](/syntax/shellvars#OPTIND) Holds the index to the next argument to be processed. This is how `getopts` \"remembers\" its own status between invocations. Also useful to shift the positional parameters after processing with `getopts`. `OPTIND` is initially set to 1, and **needs to be re-set to 1 if you want to parse anything again with getopts** + [OPTARG](/syntax/shellvars#OPTARG) This variable is set to any argument for an option found by `getopts`. It also contains the option flag of an unknown option. + [OPTERR](/syntax/shellvars#OPTERR) (Values 0 or 1) Indicates if Bash should display error messages generated by the `getopts` builtin. The value is initialized to **1** on every shell startup - so be sure to always set it to **0** if you don\'t want to see annoying messages! **`OPTERR` is not specified by POSIX for the `getopts` builtin utility \-\-- only for the C `getopt()` function in `unistd.h` (`opterr`).** `OPTERR` is bash-specific and not supported by shells such as ksh93, mksh, zsh, or dash. + +`getopts` also uses these variables for error reporting (they\'re set to +value-combinations which arent possible in normal operation). + +### Specify what you want + +The base-syntax for `getopts` is: + + getopts OPTSTRING VARNAME [ARGS...] + +where: + + `OPTSTRING` tells `getopts` which options to expect and where to expect arguments (see below) + ------------- ------------------------------------------------------------------------------------ + `VARNAME` tells `getopts` which shell-variable to use for option reporting + `ARGS` tells `getopts` to parse these optional words instead of the positional parameters + +#### The option-string + +The option-string tells `getopts` which options to expect and which of +them must have an argument. The syntax is very simple \-\-- every option +character is simply named as is, this example-string would tell +`getopts` to look for `-f`, `-A` and `-x`: + + getopts fAx VARNAME + +When you want `getopts` to expect an argument for an option, just place +a `:` (colon) after the proper option flag. If you want `-A` to expect +an argument (i.e. to become `-A SOMETHING`) just do: + + getopts fA:x VARNAME + +If the **very first character** of the option-string is a `:` (colon), +which would normally be nonsense because there\'s no option letter +preceding it, `getopts` switches to \"**silent error reporting mode**\". +In productive scripts, this is usually what you want because it allows +you to handle errors yourself without being disturbed by annoying +messages. + +#### Custom arguments to parse + +The `getopts` utility parses the [positional +parameters](/scripting/posparams) of the current shell or function by +default (which means it parses `"$@"`). + +You can give your own set of arguments to the utility to parse. Whenever +additional arguments are given after the `VARNAME` parameter, `getopts` +doesn\'t try to parse the positional parameters, but these given words. + +This way, you are able to parse any option set you like, here for +example from an array: + + while getopts :f:h opt "${MY_OWN_SET[@]}"; do + ... + done + +A call to `getopts` **without** these additional arguments is +**equivalent** to explicitly calling it with `"$@"`: + + getopts ... "$@" + +### Error Reporting + +Regarding error-reporting, there are two modes `getopts` can run in: + +- verbose mode +- silent mode + +For productive scripts I recommend to use the silent mode, since +everything looks more professional, when you don\'t see annoying +standard messages. Also it\'s easier to handle, since the failure cases +are indicated in an easier way. + +#### Verbose Mode + + invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is unset + ----------------------------- ---------------------------------------------------------------------------------------------- + required argument not found `VARNAME` is set to `?` (question-mark), `OPTARG` is unset and an *error message is printed* + +#### Silent Mode + + invalid option `VARNAME` is set to `?` (question-mark) and `OPTARG` is set to the (invalid) option character + ----------------------------- ----------------------------------------------------------------------------------------------- + required argument not found `VARNAME` is set to `:` (colon) and `OPTARG` contains the option-character in question + +## Using it + +### A first example + +Enough said - action! + +Let\'s play with a very simple case: only one option (`-a`) expected, +without any arguments. Also we disable the *verbose error handling* by +preceding the whole option string with a colon (`:`): + +``` bash +#!/bin/bash + +while getopts ":a" opt; do + case $opt in + a) + echo "-a was triggered!" >&2 + ;; + \?) + echo "Invalid option: -$OPTARG" >&2 + ;; + esac +done +``` + +I put that into a file named `go_test.sh`, which is the name you\'ll see +below in the examples. + +Let\'s do some tests: + +#### Calling it without any arguments + + $ ./go_test.sh + $ + +Nothing happened? Right. `getopts` didn\'t see any valid or invalid +options (letters preceded by a dash), so it wasn\'t triggered. + +#### Calling it with non-option arguments + + $ ./go_test.sh /etc/passwd + $ + +Again \-\-- nothing happened. The **very same** case: `getopts` didn\'t +see any valid or invalid options (letters preceded by a dash), so it +wasn\'t triggered. + +The arguments given to your script are of course accessible as `$1` - +`${N}`. + +#### Calling it with option-arguments + +Now let\'s trigger `getopts`: Provide options. + +First, an **invalid** one: + + $ ./go_test.sh -b + Invalid option: -b + $ + +As expected, `getopts` didn\'t accept this option and acted like told +above: It placed `?` into `$opt` and the invalid option character (`b`) +into `$OPTARG`. With our `case` statement, we were able to detect this. + +Now, a **valid** one (`-a`): + + $ ./go_test.sh -a + -a was triggered! + $ + +You see, the detection works perfectly. The `a` was put into the +variable `$opt` for our case statement. + +Of course it\'s possible to **mix valid and invalid** options when +calling: + + $ ./go_test.sh -a -x -b -c + -a was triggered! + Invalid option: -x + Invalid option: -b + Invalid option: -c + $ + +Finally, it\'s of course possible, to give our option **multiple +times**: + + $ ./go_test.sh -a -a -a -a + -a was triggered! + -a was triggered! + -a was triggered! + -a was triggered! + $ + +The last examples lead us to some points you may consider: + +- **invalid options don\'t stop the processing**: If you want to stop + the script, you have to do it yourself (`exit` in the right place) +- **multiple identical options are possible**: If you want to disallow + these, you have to check manually (e.g. by setting a variable or so) + +### An option with argument + +Let\'s extend our example from above. Just a little bit: + +- `-a` now takes an argument +- on an error, the parsing exits with `exit 1` + +``` bash +#!/bin/bash + +while getopts ":a:" opt; do + case $opt in + a) + echo "-a was triggered, Parameter: $OPTARG" >&2 + ;; + \?) + echo "Invalid option: -$OPTARG" >&2 + exit 1 + ;; + :) + echo "Option -$OPTARG requires an argument." >&2 + exit 1 + ;; + esac +done +``` + +Let\'s do the very same tests we did in the last example: + +#### Calling it without any arguments + + $ ./go_test.sh + $ + +As above, nothing happened. It wasn\'t triggered. + +#### Calling it with non-option arguments + + $ ./go_test.sh /etc/passwd + $ + +The **very same** case: It wasn\'t triggered. + +#### Calling it with option-arguments + +**Invalid** option: + + $ ./go_test.sh -b + Invalid option: -b + $ + +As expected, as above, `getopts` didn\'t accept this option and acted +like programmed. + +**Valid** option, but without the mandatory **argument**: + + $ ./go_test.sh -a + Option -a requires an argument. + $ + +The option was okay, but there is an argument missing. + +Let\'s provide **the argument**: + + $ ./go_test.sh -a /etc/passwd + -a was triggered, Parameter: /etc/passwd + $ + +## See also + +- Internal: [posparams](/scripting/posparams) +- Internal: [case](/syntax/ccmd/case) +- Internal: [while_loop](/syntax/ccmd/while_loop) +- POSIX + [getopts(1)](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html#tag_20_54) + and + [getopt(3)](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html) +- [parse CLI + ARGV](https://stackoverflow.com/questions/192249/how-do-i-parse-command-line-arguments-in-bash) +- [handle command-line arguments (options) to a + script](http://mywiki.wooledge.org/BashFAQ/035) diff --git a/docs/howto/mutex.md b/docs/howto/mutex.md new file mode 100644 index 0000000..b9ebed7 --- /dev/null +++ b/docs/howto/mutex.md @@ -0,0 +1,207 @@ +# Lock your script (against parallel execution) + +![](keywords>bash shell scripting mutex locking run-control) + +## Why lock? + +Sometimes there\'s a need to ensure only one copy of a script runs, i.e +prevent two or more copies running simultaneously. Imagine an important +cronjob doing something very important, which will fail or corrupt data +if two copies of the called program were to run at the same time. To +prevent this, a form of `MUTEX` (**mutual exclusion**) lock is needed. + +The basic procedure is simple: The script checks if a specific condition +(locking) is present at startup, if yes, it\'s locked - the scipt +doesn\'t start. + +This article describes locking with common UNIX(r) tools. There are +other special locking tools available, But they\'re not standardized, or +worse yet, you can\'t be sure they\'re present when you want to run your +scripts. **A tool designed for specifically for this purpose does the +job much better than general purpose code.** + +### Other, special locking tools + +As told above, a special tool for locking is the preferred solution. +Race conditions are avoided, as is the need to work around specific +limits. + +- `flock`: +- `solo`: + +## Choose the locking method + +The best way to set a global lock condition is the UNIX(r) filesystem. +Variables aren\'t enough, as each process has its own private variable +space, but the filesystem is global to all processes (yes, I know about +chroots, namespaces, \... special case). You can \"set\" several things +in the filesystem that can be used as locking indicator: + +- create files +- update file timestamps +- create directories + +To create a file or set a file timestamp, usually the command touch is +used. The following problem is implied: A locking mechanism checks for +the existance of the lockfile, if no lockfile exists, it creates one and +continues. Those are **two separate steps**! That means it\'s **not an +atomic operation**. There\'s a small amount of time between checking and +creating, where another instance of the same script could perform +locking (because when it checked, the lockfile wasn\'t there)! In that +case you would have 2 instances of the script running, both thinking +they are succesfully locked, and can operate without colliding. Setting +the timestamp is similar: One step to check the timespamp, a second step +to set the timestamp. + +\ [**Conclusion:**]{.underline} We need an +operation that does the check and the locking in one step. \ + +A simple way to get that is to create a **lock directory** - with the +mkdir command. It will: + + * create a given directory only if it does not exist, and set a successful exit code + * it will set an unsuccesful exit code if an error occours - for example, if the directory specified already exists + +With mkdir it seems, we have our two steps in one simple operation. A +(very!) simple locking code might look like this: + +``` bash +if mkdir /var/lock/mylock; then + echo "Locking succeeded" >&2 +else + echo "Lock failed - exit" >&2 + exit 1 +fi +``` + +In case `mkdir` reports an error, the script will exit at this point - +**the MUTEX did its job!** + +*If the directory is removed after setting a successful lock, while the +script is still running, the lock is lost. Doing chmod -w for the parent +directory containing the lock directory can be done, but it is not +atomic. Maybe a while loop checking continously for the existence of the +lock in the background and sending a signal such as USR1, if the +directory is not found, can be done. The signal would need to be +trapped. I am sure there there is a better solution than this +suggestion* \-\-- *[sn18](sunny_delhi18@yahoo.com) 2009/12/19 08:24* + +**Note:** While perusing the Internet, I found some people asking if the +`mkdir` method works \"on all filesystems\". Well, let\'s say it should. +The syscall under `mkdir` is guarenteed to work atomicly in all cases, +at least on Unices. Two examples of problems are NFS filesystems and +filesystems on cluster servers. With those two scenarios, dependencies +exist related to the mount options and implementation. However, I +successfully use this simple method on an Oracle OCFS2 filesystem in a +4-node cluster environment. So let\'s just say \"it should work under +normal conditions\". + +Another atomic method is setting the `noclobber` shell option +(`set -C`). That will cause redirection to fail, if the file the +redirection points to already exists (using diverse `open()` methods). +Need to write a code example here. + +``` bash + +if ( set -o noclobber; echo "locked" > "$lockfile") 2> /dev/null; then + trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT + echo "Locking succeeded" >&2 + rm -f "$lockfile" +else + echo "Lock failed - exit" >&2 + exit 1 +fi + +``` + +Another explanation of this basic pattern using `set -C` can be found +[here](http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_07). + +## An example + +This code was taken from a production grade script that controls PISG to +create statistical pages from my IRC logfiles. There are some +differences compared to the very simple example above: + +- the locking stores the process ID of the locked instance +- if a lock fails, the script tries to find out if the locked instance + still is active (unreliable!) +- traps are created to automatically remove the lock when the script + terminates, or is killed + +Details on how the script is killed aren\'t given, only code relevant to +the locking process is shown: + +``` bash +#!/bin/bash + +# lock dirs/files +LOCKDIR="/tmp/statsgen-lock" +PIDFILE="${LOCKDIR}/PID" + +# exit codes and text +ENO_SUCCESS=0; ETXT[0]="ENO_SUCCESS" +ENO_GENERAL=1; ETXT[1]="ENO_GENERAL" +ENO_LOCKFAIL=2; ETXT[2]="ENO_LOCKFAIL" +ENO_RECVSIG=3; ETXT[3]="ENO_RECVSIG" + +### +### start locking attempt +### + +trap 'ECODE=$?; echo "[statsgen] Exit: ${ETXT[ECODE]}($ECODE)" >&2' 0 +echo -n "[statsgen] Locking: " >&2 + +if mkdir "${LOCKDIR}" &>/dev/null; then + + # lock succeeded, install signal handlers before storing the PID just in case + # storing the PID fails + trap 'ECODE=$?; + echo "[statsgen] Removing lock. Exit: ${ETXT[ECODE]}($ECODE)" >&2 + rm -rf "${LOCKDIR}"' 0 + echo "$$" >"${PIDFILE}" + # the following handler will exit the script upon receiving these signals + # the trap on "0" (EXIT) from above will be triggered by this trap's "exit" command! + trap 'echo "[statsgen] Killed by a signal." >&2 + exit ${ENO_RECVSIG}' 1 2 3 15 + echo "success, installed signal handlers" + +else + + # lock failed, check if the other PID is alive + OTHERPID="$(cat "${PIDFILE}")" + + # if cat isn't able to read the file, another instance is probably + # about to remove the lock -- exit, we're *still* locked + # Thanks to Grzegorz Wierzowiecki for pointing out this race condition on + # http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash + if [ $? != 0 ]; then + echo "lock failed, PID ${OTHERPID} is active" >&2 + exit ${ENO_LOCKFAIL} + fi + + if ! kill -0 $OTHERPID &>/dev/null; then + # lock is stale, remove it and restart + echo "removing stale lock of nonexistant PID ${OTHERPID}" >&2 + rm -rf "${LOCKDIR}" + echo "[statsgen] restarting myself" >&2 + exec "$0" "$@" + else + # lock is valid and OTHERPID is active - exit, we're locked! + echo "lock failed, PID ${OTHERPID} is active" >&2 + exit ${ENO_LOCKFAIL} + fi + +fi +``` + +## Related links + +- [BashFAQ/045](http://mywiki.wooledge.org/BashFAQ/045) +- [Implementation of a shell locking + utility](http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash) +- [Wikipedia article on File + Locking](http://en.wikipedia.org/wiki/File_locking), including a + discussion of potential + [problems](http://en.wikipedia.org/wiki/File_locking#Problems) with + flock and certain versions of NFS. diff --git a/docs/howto/pax.md b/docs/howto/pax.md new file mode 100644 index 0000000..37af076 --- /dev/null +++ b/docs/howto/pax.md @@ -0,0 +1,353 @@ +# pax - the POSIX archiver + +![](keywords>bash shell scripting POSIX archive tar packing zip) + +pax can do a lot of fancy stuff, feel free to contribute more awesome +pax tricks! + +## Introduction + +The POSIX archiver, `pax`, is an attempt at a standardized archiver with +the best features of `tar` and `cpio`, able to handle all common archive +types. + +However, this is **not a manpage**, it will **not** list all possible +options, it will **not** you detailed information about `pax`. It\'s +only an introduction. + +This article is based on the debianized Berkeley implementation of +`pax`, but implementation-specific things should be tagged as such. +Unfortunately, the Debian package doesn\'t seem to be maintained +anymore. + +## Overview + +### Operation modes + +There are four basic operation modes to *list*, *read*, *write* and +*copy* archives. They\'re switched with combinations of `-r` and `-w` +command line options: + + Mode RW-Options + ------- ----------------- + List *no RW-options* + Read `-r` + Write `-w` + Copy `-r -w` + +#### List + +In *list mode*, `pax` writes the list of archive members to standard +output (a table of contents). If a pattern match is specified on the +command line, only matching filenames are printed. + +#### Read + +*Read* an archive. `pax` will read archive data and extract the members +to the current directory. If a pattern match is specified on the command +line, only matching filenames are extracted. + +When reading an archive, the archive type is determined from the archive +data. + +#### Write + +*Write* an archive, which means create a new one or append to an +existing one. All files and directories specified on the command line +are inserted into the archive. The archive is written to standard output +by default. + +If no files are specified on the command line, filenames are read from +`STDIN`. + +The write mode is the only mode where you need to specify the archive +type with `-x `, e.g. `-x ustar`. + +#### Copy + +*Copy* mode is similar to `cpio` passthrough mode. It provides a way to +replicate a complete or partial file hierarchy (with all the `pax` +options, e.g. rewriting groups) to another location. + +### Archive data + +When you don\'t specify anything special, `pax` will attempt to read +archive data from standard input (read/list modes) and write archive +data to standard output (write mode). This ensures `pax` can be easily +used as part of a shell pipe construct, e.g. to read a compressed +archive that\'s decompressed in the pipe. + +The option to specify the pathname of a file to be archived is `-f` This +file will be used as input or output, depending on the operation +(read/write/list). + +When pax reads an archive, it tries to guess the archive type. However, +in *write* mode, you must specify which type of archive to append using +the `-x ` switch. If you omit this switch, a default archive will +be created (POSIX says it\'s implementation defined, Berkeley `pax` +creates `ustar` if no options are specified). + +The following archive formats are supported (Berkeley implementation): + + --------- ---------------------------- + ustar POSIX TAR format (default) + cpio POSIX CPIO format + tar classic BSD TAR format + bcpio old binary CPIO format + sv4cpio SVR4 CPIO format + sv4crc SVR4 CPIO format with CRC + --------- ---------------------------- + +Berkeley `pax` supports options `-z` and `-j`, similar to GNU `tar`, to +filter archive files through GZIP/BZIP2. + +### Matching archive members + +In *read* and *list* modes, you can specify patterns to determine which +files to list or extract. + +- the pattern notation is the one known by a POSIX-shell, i.e. the one + known by Bash without `extglob` +- if the specified pattern matches a complete directory, it affects + all files and subdirectories of the specified directory +- if you specify the `-c` option, `pax` will invert the matches, i.e. + it matches all filenames **except** those matching the specified + patterns +- if no patterns are given, `pax` will \"match\" (list or extract) all + files from the archive +- **To avoid conflicts with shell pathname expansion, it\'s wise to + quote patterns!** + +#### Some assorted examples of patterns + + pax -r archive.tar README.txt *.png data/ + + # equivalent, extract archive contents directly to a file + pax -w -x ustar -f archive.tar README.txt *.png data/ + +`pax` is in *write* mode, the given filenames are packed into an +archive: + +- `README.txt` is a normal file, it will be packed +- `*.png` is a pathname glob **for your shell**, the shell will + substitute all matching filenames **before** `pax` is executed. The + result is a list of filenames that will be packed like the + `README.txt` example above +- `data/` is a directory. **Everything** in this directory will be + packed into the archive, i.e. not just an empty directory + +When you specify the `-v` option, `pax` will write the pathnames of the +files inserted into the archive to `STDERR`. + +When, and only when, no filename arguments are specified, `pax` attempts +to read filenames from `STDIN`, separated by newlines. This way you can +easily combine `find` with `pax`: + + find . -name '*.txt' | pax -wf textfiles.tar -x ustar + +### Listing archive contents + +The standard output format to list archive members simply is to print +each filename to a separate line. But the output format can be +customized to include permissions, timestamps, etc. with the +`-o listopt=` specification. The syntax of the format +specification is strongly derived from the `printf(3)` format +specification. + +**Unfortunately** the `pax` utility delivered with Debian doesn\'t seem +to support these extended listing formats. + +However, `pax` lists archive members in a `ls -l`-like format, when you +give the `-v` option: + + pax -v `. It takes the string rewrite specification +as an argument, in the form `/OLD/NEW/[gp]`, which is an `ed(1)`-like +regular expression (BRE) for `old` and generally can be used like the +popular sed construct `s/from/to/`. Any non-null character can be used +as a delimiter, so to mangle pathnames (containing slashes), you could +use `#/old/path#/new/path#`. + +The optional `g` and `p` flags are used to apply substitution +**(g)**lobally to the line or to **(p)**rint the original and rewritten +strings to `STDERR`. + +Multiple `-s` options can be specified on the command line. They are +applied to the pathname strings of the files or archive members. This +happens in the order they are specified. + +### Excluding files from an archive + +The -s command seen above can be used to exclude a file. The +substitution must result in a null string: For example, let\'s say that +you want to exclude all the CVS directories to create a source code +archive. We are going to replace the names containing /CVS/ with +nothing, note the .\* they are needed because we need to match the +entire pathname. + + pax -w -x ustar -f release.tar -s',.*/CVS/.*,,' myapplication + +You can use several -s options, for instance, let\'s say you also want +to remove files ending in \~: + + pax -w -x ustar -f release.tar -'s,.*/CVS/.*,,' -'s/.*~//' myapplication + +This can also be done while reading an archive, for instance, suppose +you have an archive containing a \"usr\" and a \"etc\" directory but +that you want to extract only the \"usr\" directory: + + pax -r -f archive.tar -s',^etc/.*,,' #the etc/ dir is not extracted + +### Getting archive filenames from STDIN + +Like `cpio`, pax can read filenames from standard input (`stdin`). This +provides great flexibility - for example, a `find(1)` command may select +files/directories in ways pax can\'t do itself. In **write** mode +(creating an archive) or **copy** mode, when no filenames are given, pax +expects to read filenames from standard input. For example: + + # Back up config files changed less than 3 days ago + find /etc -type f -mtime -3 | pax -x ustar -w -f /backups/etc.tar + + # Copy only the directories, not the files + mkdir /target + find . -type d -print | pax -r -w -d /target + + # Back up anything that changed since the last backup + find . -newer /var/run/mylastbackup -print0 | + pax -0 -x ustar -w -d -f /backups/mybackup.tar + touch /var/run/mylastbackup + +The `-d` option tells pax `not` to recurse into directories it reads +(`cpio`-style). Without `-d`, pax recurses into all directories +(`tar`-style). + +**Note**: the `-0` option is not standard, but is present in some +implementations. + +## From tar to pax + +`pax` can handle the `tar` archive format, if you want to switch to the +standard tool an alias like: + + alias tar='echo USE PAX, idiot. pax is the standard archiver!; # ' + +in your `~/.bashrc` can be useful :-D. + +Here is a quick table comparing (GNU) `tar` and `pax` to help you to +make the switch: + + TAR PAX Notes + ------------------------------------- ------------------------------------------ ----------------------------------------------------------------------- + `tar xzvf file.tar.gz` `pax -rvz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip archive.tar.gz` + `tar xjvf file.tar.bz2` `bunzip2 archive.tar.bz2` + `tar tzvf file.tar.gz` `pax -vz -f file.tar.gz` `-z` is an extension, POSIXly: `gunzip bash shell scripting tutorial redirection redirect file descriptor) + +This tutorial is not a complete guide to redirection, it will not cover +here docs, here strings, name pipes etc\... I just hope it\'ll help you +to understand what things like `3>&2`, `2>&1` or `1>&3-` do. + +# stdin, stdout, stderr + +When Bash starts, normally, 3 file descriptors are opened, `0`, `1` and +`2` also known as standard input (`stdin`), standard output (`stdout`) +and standard error (`stderr`). + +For example, with Bash running in a Linux terminal emulator, you\'ll +see: + + # lsof +f g -ap $BASHPID -d 0,1,2 + COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME + bash 12135 root 0u CHR RW,LG 136,13 0t0 16 /dev/pts/5 + bash 12135 root 1u CHR RW,LG 136,13 0t0 16 /dev/pts/5 + bash 12135 root 2u CHR RW,LG 136,13 0t0 16 /dev/pts/5 + +This `/dev/pts/5` is a pseudo terminal used to emulate a real terminal. +Bash reads (`stdin`) from this terminal and prints via `stdout` and +`stderr` to this terminal. + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +When a command, a compound command, a subshell etc. is executed, it +inherits these file descriptors. For instance `echo foo` will send the +text `foo` to the file descriptor `1` inherited from the shell, which is +connected to `/dev/pts/5`. + +# Simple Redirections + +## Output Redirection \"n\> file\" + +`>` is probably the simplest redirection. + +`echo foo > file` + +the `> file` after the command alters the file descriptors belonging to +the command `echo`. It changes the file descriptor `1` (`> file` is the +same as `1>file`) so that it points to the file `file`. They will look +like: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| file | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +Now characters written by our command, `echo`, that are sent to the +standard output, i.e., the file descriptor `1`, end up in the file named +`file`. + +In the same way, command `2> file` will change the standard error and +will make it point to `file`. Standard error is used by applications to +print errors. + +What will `command 3> file` do? It will open a new file descriptor +pointing to `file`. The command will then start with: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + new descriptor ( 3 ) ---->| file | + --- +-----------------------+ + +What will the command do with this descriptor? It depends. Often +nothing. We will see later why we might want other file descriptors. + +## Input Redirection \"n\< file\" + +When you run a commandusing `command < file`, it changes the file +descriptor `0` so that it looks like: + + --- +-----------------------+ + standard input ( 0 ) <----| file | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +If the command reads from `stdin`, it now will read from `file` and not +from the console. + +As with `>`, `<` can be used to open a new file descriptor for reading, +`command 3| /dev/pts/5 | ------> ( 0 ) ---->|pipe (read) | + --- +--------------+ / --- +--------------+ + / + --- +--------------+ / --- +--------------+ + ( 1 ) ---->| pipe (write) | / ( 1 ) ---->| /dev/pts | + --- +--------------+ --- +--------------+ + + --- +--------------+ --- +--------------+ + ( 2 ) ---->| /dev/pts/5 | ( 2 ) ---->| /dev/pts/ | + --- +--------------+ --- +--------------+ + +This is possible because the redirections are set up by the shell +**before** the commands are executed, and the commands inherit the file +descriptors. + +# More On File Descriptors + +## Duplicating File Descriptor 2\>&1 + +We have seen how to open (or redirect) file descriptors. Let us see how +to duplicate them, starting with the classic `2>&1`. What does this +mean? That something written on the file descriptor `2` will go where +file descriptor `1` goes. In a shell `command 2>&1` is not a very +interesting example so we will use `ls /tmp/ doesnotexist 2>&1 | less` + + ls /tmp/ doesnotexist 2>&1 | less + + --- +--------------+ --- +--------------+ + ( 0 ) ---->| /dev/pts/5 | ------> ( 0 ) ---->|from the pipe | + --- +--------------+ / ---> --- +--------------+ + / / + --- +--------------+ / / --- +--------------+ + ( 1 ) ---->| to the pipe | / / ( 1 ) ---->| /dev/pts | + --- +--------------+ / --- +--------------+ + / + --- +--------------+ / --- +--------------+ + ( 2 ) ---->| to the pipe | / ( 2 ) ---->| /dev/pts/ | + --- +--------------+ --- +--------------+ + +Why is it called *duplicating*? Because after `2>&1`, we have 2 file +descriptors pointing to the same file. Take care not to call this \"File +Descriptor Aliasing\"; if we redirect `stdout` after `2>&1` to a file +`B`, file descriptor `2` will still be opened on the file `A` where it +was. This is often misunderstood by people wanting to redirect both +standard input and standard output to the file. Continue reading for +more on this. + +So if you have two file descriptors `s` and `t` like: + + --- +-----------------------+ + a descriptor ( s ) ---->| /some/file | + --- +-----------------------+ + --- +-----------------------+ + a descriptor ( t ) ---->| /another/file | + --- +-----------------------+ + +Using a `t>&s` (where `t` and `s` are numbers) it means: + +> Copy whatever file descriptor `s` contains into file descriptor `t` + +So you got a copy of this descriptor: + + --- +-----------------------+ + a descriptor ( s ) ---->| /some/file | + --- +-----------------------+ + --- +-----------------------+ + a descriptor ( t ) ---->| /some/file | + --- +-----------------------+ + +Internally each of these is represented by a file descriptor opened by +the operating system\'s `fopen` calls, and is likely just a pointer to +the file which has been opened for reading (`stdin` or file descriptor +`0`) or writing (`stdout` /`stderr`). + +Note that the file reading or writing positions are also duplicated. If +you have already read a line of `s`, then after `t>&s` if you read a +line from `t`, you will get the second line of the file. + +Similarly for output file descriptors, writing a line to file descriptor +`s` will append a line to a file as will writing a line to file +descriptor `t`. + +\The syntax is somewhat confusing in that you would think +that the arrow would point in the direction of the copy, but it\'s +reversed. So it\'s `target>&source` effectively.\ + +So, as a simple example (albeit slightly contrived), is the following: + + exec 3>&1 # Copy 1 into 3 + exec 1> logfile # Make 1 opened to write to logfile + lotsa_stdout # Outputs to fd 1, which writes to logfile + exec 1>&3 # Copy 3 back into 1 + echo Done # Output to original stdout + +## Order Of Redirection, i.e., \"\> file 2\>&1\" vs. \"2\>&1 \>file\" + +While it doesn\'t matter where the redirections appears on the command +line, their order does matter. They are set up from left to right. + +- `2>&1 >file` + +A common error, is to do `command 2>&1 > file` to redirect both `stderr` +and `stdout` to `file`. Let\'s see what\'s going on. First we type the +command in our terminal, the descriptors look like this: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +Then our shell, Bash sees `2>&1` so it duplicates 1, and the file +descriptor look like this: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +That\'s right, nothing has changed, 2 was already pointing to the same +place as 1. Now Bash sees `> file` and thus changes `stdout`: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| file | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +And that\'s not what we want. + +- `>file 2>&1` + +Now let\'s look at the correct `command >file 2>&1`. We start as in the +previous example, and Bash sees `> file`: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| file | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +Then it sees our duplication `2>&1`: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| file | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| file | + --- +-----------------------+ + +And voila, both `1` and `2` are redirected to file. + +## Why sed \'s/foo/bar/\' file \>file Doesn\'t Work + +This is a common error, we want to modify a file using something that +reads from a file and writes the result to `stdout`. To do this, we +redirect stdout to the file we want to modify. The problem here is that, +as we have seen, the redirections are setup before the command is +actually executed. + +So **BEFORE** sed starts, standard output has already been redirected, +with the additional side effect that, because we used \>, \"file\" gets +truncated. When `sed` starts to read the file, it contains nothing. + +## exec + +In Bash the `exec` built-in replaces the shell with the specified +program. So what does this have to do with redirection? `exec` also +allow us to manipulate the file descriptors. If you don\'t specify a +program, the redirection after `exec` modifies the file descriptors of +the current shell. + +For example, all the commands after `exec 2>file` will have file +descriptors like: + + --- +-----------------------+ + standard input ( 0 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| file | + --- +-----------------------+ + +All the the errors sent to `stderr` by the commands after the +`exec 2>file` will go to the file, just as if you had the command in a +script and ran `myscript 2>file`. + +`exec` can be used, if, for instance, you want to log the errors the +commands in your script produce, just add `exec 2>myscript.errors` at +the beginning of your script. + +Let\'s see another use case. We want to read a file line by line, this +is easy, we just do: + + while read -r line;do echo "$line";done < file + +Now, we want, after printing each line, to do a pause, waiting for the +user to press a key: + + while read -r line;do echo "$line"; read -p "Press any key" -n 1;done < file + +And, surprise this doesn\'t work. Why? because the shell descriptor of +the while loop looks like: + + --- +-----------------------+ + standard input ( 0 ) ---->| file | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + +and our read inherits these descriptors, and our command +(`read -p "Press any key" -n 1`) inherits them, and thus reads from file +and not from our terminal. + +A quick look at `help read` tells us that we can specify a file +descriptor from which `read` should read. Cool. Now let\'s use `exec` to +get another descriptor: + + exec 3| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard output ( 1 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + standard error ( 2 ) ---->| /dev/pts/5 | + --- +-----------------------+ + + --- +-----------------------+ + new descriptor ( 3 ) ---->| file | + --- +-----------------------+ + +and it works. + +## Closing The File Descriptors + +Closing a file through a file descriptor is easy, just make it a +duplicate of -. For instance, let\'s close `stdin <&-` and +`stderr 2>&-`: + + bash -c '{ lsof -a -p $$ -d0,1,2 ;} <&- 2>&-' + COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME + bash 10668 pgas 1u CHR 136,2 4 /dev/pts/2 + +we see that inside the `{}` that only `1` is still here. + +Though the OS will probably clean up the mess, it is perhaps a good idea +to close the file descriptors you open. For instance, if you open a file +descriptor with `exec 3>file`, all the commands afterwards will inherit +it. It\'s probably better to do something like: + + exec 3>file + ..... + #commands that uses 3 + ..... + exec 3>&- + + #we don't need 3 any more + +I\'ve seen some people using this as a way to discard, say stderr, using +something like: command 2\>&-. Though it might work, I\'m not sure if +you can expect all applications to behave correctly with a closed +stderr. + +When in doubt, I use 2\>/dev/null. + +# An Example + +This example comes from [this post +(ffe4c2e382034ed9)](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9) +on the comp.unix.shell group: + + { + { + cmd1 3>&- | + cmd2 2>&3 3>&- + } 2>&1 >&4 4>&- | + cmd3 3>&- 4>&- + + } 3>&2 4>&1 + +The redirections are processed from left to right, but as the file +descriptors are inherited we will also have to work from the outer to +the inner contexts. We will assume that we run this command in a +terminal. Let\'s start with the outer `{ } 3>&2 4>&1`. + + --- +-------------+ --- +-------------+ + ( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 | + --- +-------------+ --- +-------------+ + + --- +-------------+ --- +-------------+ + ( 1 ) ---->| /dev/pts/5 | ( 4 ) ---->| /dev/pts/5 | + --- +-------------+ --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| /dev/pts/5 | + --- +-------------+ + +We only made 2 copies of `stderr` and `stdout`. `3>&1 4>&1` would have +produced the same result here because we ran the command in a terminal +and thus `1` and `2` go to the terminal. As an exercise, you can start +with `1` pointing to `file.stdout` and 2 pointing to `file.stderr`, you +will see why these redirections are very nice. + +Let\'s continue with the right part of the second pipe: +`| cmd3 3>&- 4>&-` + + --- +-------------+ + ( 0 ) ---->| 2nd pipe | + --- +-------------+ + + --- +-------------+ + ( 1 ) ---->| /dev/pts/5 | + --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| /dev/pts/5 | + --- +-------------+ + +It inherits the previous file descriptors, closes 3 and 4 and sets up a +pipe for reading. Now for the left part of the second pipe +`{...} 2>&1 >&4 4>&- |` + + --- +-------------+ --- +-------------+ + ( 0 ) ---->| /dev/pts/5 | ( 3 ) ---->| /dev/pts/5 | + --- +-------------+ --- +-------------+ + + --- +-------------+ + ( 1 ) ---->| /dev/pts/5 | + --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| 2nd pipe | + --- +-------------+ + +First, The file descriptor `1` is connected to the pipe (`|`), then `2` +is made a copy of `1` and thus is made an fd to the pipe (`2>&1`), then +`1` is made a copy of `4` (`>&4`), then `4` is closed. These are the +file descriptors of the inner `{}`. Lcet\'s go inside and have a look at +the right part of the first pipe: `| cmd2 2>&3 3>&-` + + --- +-------------+ + ( 0 ) ---->| 1st pipe | + --- +-------------+ + + --- +-------------+ + ( 1 ) ---->| /dev/pts/5 | + --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| /dev/pts/5 | + --- +-------------+ + +It inherits the previous file descriptors, connects 0 to the 1st pipe, +the file descriptor 2 is made a copy of 3, and 3 is closed. Finally, for +the left part of the pipe: + + --- +-------------+ + ( 0 ) ---->| /dev/pts/5 | + --- +-------------+ + + --- +-------------+ + ( 1 ) ---->| 1st pipe | + --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| 2nd pipe | + --- +-------------+ + +It also inherits the file descriptor of the left part of the 2nd pipe, +file descriptor `1` is connected to the first pipe, `3` is closed. + +The purpose of all this becomes clear if we take only the commands: + + cmd2 + + --- +-------------+ + -->( 0 ) ---->| 1st pipe | + / --- +-------------+ + / + / --- +-------------+ + cmd 1 / ( 1 ) ---->| /dev/pts/5 | + / --- +-------------+ + / + --- +-------------+ / --- +-------------+ + ( 0 ) ---->| /dev/pts/5 | / ( 2 ) ---->| /dev/pts/5 | + --- +-------------+ / --- +-------------+ + / + --- +-------------+ / cmd3 + ( 1 ) ---->| 1st pipe | / + --- +-------------+ --- +-------------+ + ------------>( 0 ) ---->| 2nd pipe | + --- +-------------+ / --- +-------------+ + ( 2 ) ---->| 2nd pipe |/ + --- +-------------+ --- +-------------+ + ( 1 ) ---->| /dev/pts/5 | + --- +-------------+ + + --- +-------------+ + ( 2 ) ---->| /dev/pts/5 | + --- +-------------+ + +As said previously, as an exercise, you can start with `1` open on a +file and `2` open on another file to see how the `stdin` from `cmd2` and +`cmd3` goes to the original `stdin` and how the `stderr` goes to the +original `stderr`. + +# Syntax + +I used to have trouble choosing between `0&<3` `3&>1` `3>&1` `->2` +`-<&0` `&-<0` `0<&-` etc\... (I think probably because the syntax is +more representative of the result, i.e., the redirection, than what is +done, i.e., opening, closing, or duplicating file descriptors). + +If this fits your situation, then maybe the following \"rules\" will +help you, a redirection is always like the following: + + lhs op rhs + +- `lhs` is always a file description, i.e., a number: + - Either we want to open, duplicate, move or we want to close. If + the op is `<` then there is an implicit 0, if it\'s `>` or `>>`, + there is an implicit 1. + +```{=html} + +``` +- `op` is `<`, `>`, `>>`, `>|`, or `<>`: + - `<` if the file decriptor in `lhs` will be read, `>` if it will + be written, `>>` if data is to be appended to the file, `>|` to + overwrite an existing file or `<>` if it will be both read and + written. + +```{=html} + +``` +- `rhs` is the thing that the file descriptor will describe: + - It can be the name of a file, the place where another descriptor + goes (`&1`), or, `&-`, which will close the file descriptor. + +You might not like this description, and find it a bit incomplete or +inexact, but I think it really helps to easily find that, say `&->0` is +incorrect. + +### A note on style + +The shell is pretty loose about what it considers a valid redirect. +While opinions probably differ, this author has some (strong) +recommendations: + +- **Always** keep redirections \"tightly grouped\" \-- that is, **do + not** include whitespace anywhere within the redirection syntax + except within quotes if required on the RHS (e.g. a filename that + contains a space). Since shells fundamentally use whitespace to + delimit fields in general, it is visually much clearer for each + redirection to be separated by whitespace, but grouped in chunks + that contain no unnecessary whitespace. + +```{=html} + +``` +- **Do** always put a space between each redirection, and between the + argument list and the first redirect. + +```{=html} + +``` +- **Always** place redirections together at the very end of a command + after all arguments. Never precede a command with a redirect. Never + put a redirect in the middle of the arguments. + +```{=html} + +``` +- **Never** use the Csh `&>foo` and `>&foo` shorthand redirects. Use + the long form `>foo 2>&1`. (see: [obsolete](obsolete)) + +```{=html} + +``` + # Good! This is clearly a simple commmand with two arguments and 4 redirections + cmd arg1 arg2 /dev/null >&2 + + # Good! + { cmd1 <<<'my input'; cmd2; } >someFile + + # Bad. Is the "1" a file descriptor or an argument to cmd? (answer: it's the FD). Is the space after the herestring part of the input data? (answer: No). + # The redirects are also not delimited in any obvious way. + cmd 2>& 1 <<< stuff + + # Hideously Bad. It's difficult to tell where the redirects are and whether they're even valid redirects. + # This is in fact one command with one argument, an assignment, and three redirects. + foo=barbleh + +# Conclusion + +I hope this tutorial worked for you. + +I lied, I did not explain `1>&3-`, go check the manual ;-) + +Thanks to Stéphane Chazelas from whom I stole both the intro and the +example\.... + +The intro is inspired by this introduction, you\'ll find a nice exercise +there too: + +- [A Detailed Introduction to I/O and I/O + Redirection](http://tldp.org/LDP/abs/html/ioredirintro.html) + +The last example comes from this post: + +- [comp.unix.shell: piping stdout and stderr to different + processes](http://groups.google.com/group/comp.unix.shell/browse_thread/thread/64206d154894a4ef/ffe4c2e382034ed9#ffe4c2e382034ed9) + +# See also + +- Internal: [Redirection syntax overview](/syntax/redirection) diff --git a/docs/howto/testing-your-scripts.md b/docs/howto/testing-your-scripts.md new file mode 100644 index 0000000..00a4774 --- /dev/null +++ b/docs/howto/testing-your-scripts.md @@ -0,0 +1,94 @@ +The one of the simplest way to check your bash/sh scripts is run it and +check it output or run it and check the result. This tutorial shows +how-to use [bashtest](https://github.com/pahaz/bashtest) tool for +testing your scripts. + +### Write simple util + +We have a simple **stat.sh** script: + + #!/usr/bin/env bash + + if [ -z "$1" ] + then + DIR=./ + else + DIR=$1 + fi + + echo "Evaluate *.py statistics" + FILES=$(find $DIR -name '*.py' | wc -l) + LINES=$((find $DIR -name '*.py' -print0 | xargs -0 cat) | wc -l) + echo "PYTHON FILES: $FILES" + echo "PYTHON LINES: $LINES" + +This script evaluate the number of python files and the number of python +code lines in the files. We can use it like **./stat.sh \** + +### Create testsuit + +Then make test suits for **stat.sh**. We make a directory **testsuit** +which contain test python files. + +**testsuit/main.py** + + import foo + print(foo) + +**testsuit/foo.py** + + BAR = 1 + BUZ = BAR + 2 + +Ok! Our test suit is ready! We have 2 python files which contains 4 +lines of code. + +### Write bashtests + +Lets write tests. We just write a shell command for testing our work. + +Create file **tests.bashtest**: + + $ ./stat.sh testsuit/ + Evaluate *.py statistics + PYTHON FILES: 2 + PYTHON LINES: 4 + +This is our test! This is simple. Try to run it. + + # install bashtest if required! + $ pip install bashtest + + # run tests + $ bashtest *.bashtest + 1 items passed all tests: + 1 tests in tests.bashtest + 1 tests in 1 items. + 1 passed and 0 failed. + Test passed. + +Thats all. We wrote one test. You can write more tests if you want. + + $ ls testsuit/ + foo.py main.py + + $ ./stat.sh testsuit/ + Evaluate *.py statistics + PYTHON FILES: 2 + PYTHON LINES: 4 + +And run tests again: + + $ bashtest *.bashtest + 1 items passed all tests: + 2 tests in tests.bashtest + 2 tests in 1 items. + 2 passed and 0 failed. + Test passed. + +You can find more **.bashtest** examples in the [bashtest github +repo](https://github.com/pahaz/bashtest). You can also write your +question or report a bug +[here](https://github.com/pahaz/bashtest/issues). + +Happy testing!