2023-07-05 11:31:29 +02:00
|
|
|
# Scripting with style
|
|
|
|
|
|
|
|
FIXME continue
|
|
|
|
|
|
|
|
These are some coding guidelines that helped me to read and understand
|
|
|
|
my own code over the years. They also will help to produce code that
|
|
|
|
will be a bit more robust than \"if something breaks, I know how to fix
|
|
|
|
it\".
|
|
|
|
|
|
|
|
This is not a bible, of course. But I have seen so much ugly and
|
|
|
|
terrible code (not only in shell) during all the years, that I\'m 100%
|
|
|
|
convinced there needs to be *some* code layout and style. No matter
|
2024-03-30 20:09:26 +01:00
|
|
|
which one you use, use it throughout your code (at least don't change
|
|
|
|
it within the same shellscript file); don't change your code layout
|
2023-07-05 11:31:29 +02:00
|
|
|
with your mood.
|
|
|
|
|
|
|
|
Some good code layout helps you to read your own code after a while. And
|
|
|
|
of course it helps others to read the code.
|
|
|
|
|
|
|
|
## Indentation guidelines
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
Indentation is nothing that technically influences a script, it's only
|
2023-07-05 11:31:29 +02:00
|
|
|
for us humans.
|
|
|
|
|
|
|
|
I\'m used to seeing/using indentation of *two space characters* (though
|
|
|
|
many may prefer 4 spaces, see below in the discussion section):
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
- it's easy and fast to type
|
|
|
|
- it's not a hard-tab that's displayed differently in different
|
2023-07-05 11:31:29 +02:00
|
|
|
environments
|
2024-03-30 19:22:45 +01:00
|
|
|
- it's wide enough to give a visual break and small enough to not
|
2023-07-05 11:31:29 +02:00
|
|
|
waste too much space on the line
|
|
|
|
|
|
|
|
Speaking of hard-tabs: Avoid them if possible. They only make trouble. I
|
|
|
|
can imagine one case where they\'re useful: Indenting
|
2024-01-29 02:01:50 +01:00
|
|
|
[here-documents](../syntax/redirection.md#here_documents).
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
### Breaking up lines
|
|
|
|
|
|
|
|
Whenever you need to break lines of long code, you should follow one of
|
|
|
|
these two rules:
|
|
|
|
|
|
|
|
[**Indention using command width:**]{.underline}
|
|
|
|
|
|
|
|
activate some_very_long_option \
|
|
|
|
some_other_option
|
|
|
|
|
|
|
|
[**Indention using two spaces:**]{.underline}
|
|
|
|
|
|
|
|
activate some_very_long_option \
|
|
|
|
some_other_option
|
|
|
|
|
|
|
|
Personally, with some exceptions, I prefer the first form because it
|
|
|
|
supports the visual impression of \"these belong together\".
|
|
|
|
|
|
|
|
### Breaking compound commands
|
|
|
|
|
2024-01-29 02:01:50 +01:00
|
|
|
[Compound commands](../syntax/ccmd/intro.md) form the structures that make a
|
2023-07-05 11:31:29 +02:00
|
|
|
shell script different from a stupid enumeration of commands. Usually
|
|
|
|
they contain a kind of \"head\" and a \"body\" that contains command
|
|
|
|
lists. This type of compound command is relatively easy to indent.
|
|
|
|
|
|
|
|
I\'m used to (not all points apply to all compound commands, just pick
|
|
|
|
the basic idea):
|
|
|
|
|
|
|
|
- put the introducing keyword and the initial command list or
|
|
|
|
parameters on one line (\"head\")
|
|
|
|
- put the \"body-introducing\" keyword on the same line
|
|
|
|
- the command list of the \"body\" on separate lines, indented by two
|
|
|
|
spaces
|
|
|
|
- put the closing keyword on a separated line, indented like the
|
|
|
|
initial introducing keyword
|
|
|
|
|
|
|
|
What?! Well, here again:
|
|
|
|
|
|
|
|
##### Symbolic
|
|
|
|
|
|
|
|
HEAD_KEYWORD parameters; BODY_BEGIN
|
|
|
|
BODY_COMMANDS
|
|
|
|
BODY_END
|
|
|
|
|
|
|
|
##### if/then/elif/else
|
|
|
|
|
|
|
|
This construct is a bit special, because it has keywords (`elif`,
|
|
|
|
`else`) \"in the middle\". The visually appealing way is to indent them
|
|
|
|
like this:
|
|
|
|
|
|
|
|
if ...; then
|
|
|
|
...
|
|
|
|
elif ...; then
|
|
|
|
...
|
|
|
|
else
|
|
|
|
...
|
|
|
|
fi
|
|
|
|
|
|
|
|
##### for
|
|
|
|
|
|
|
|
for f in /etc/*; do
|
|
|
|
...
|
|
|
|
done
|
|
|
|
|
|
|
|
##### while/until
|
|
|
|
|
|
|
|
while [[ $answer != [YyNn] ]]; do
|
|
|
|
...
|
|
|
|
done
|
|
|
|
|
|
|
|
##### The case construct
|
|
|
|
|
|
|
|
The `case` construct might need a bit more discussion here, since its
|
|
|
|
structure is a bit more complex.
|
|
|
|
|
|
|
|
In general, every new \"layer\" gets a new indentation level:
|
|
|
|
|
|
|
|
case $input in
|
|
|
|
hello)
|
|
|
|
echo "You said hello"
|
|
|
|
;;
|
|
|
|
bye)
|
|
|
|
echo "You said bye"
|
|
|
|
if foo; then
|
|
|
|
bar
|
|
|
|
fi
|
|
|
|
;;
|
|
|
|
*)
|
|
|
|
echo "You said something weird..."
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
|
|
|
|
Some notes:
|
|
|
|
|
|
|
|
- if not 100% needed, the optional left parenthesis on the pattern is
|
|
|
|
not used
|
|
|
|
- the patterns (`hello)`) and the corresponding action terminator
|
|
|
|
(`;;`) are indented at the same level
|
|
|
|
- the action command lists are indented one more level (and continue
|
|
|
|
to have their own indentation, if needed)
|
|
|
|
- though optional, the very last action terminator is given
|
|
|
|
|
|
|
|
## Syntax and coding guidelines
|
|
|
|
|
|
|
|
### Cryptic constructs
|
|
|
|
|
|
|
|
Cryptic constructs, we all know them, we all love them. If they are not
|
|
|
|
100% needed, avoid them, since nobody except you may be able to decipher
|
|
|
|
them.
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
It's - just like in C - the middle ground between smart, efficient and
|
2023-07-05 11:31:29 +02:00
|
|
|
readable.
|
|
|
|
|
|
|
|
If you need to use a cryptic construct, include a comment that explains
|
|
|
|
what your \"monster\" does.
|
|
|
|
|
|
|
|
### Variable names
|
|
|
|
|
|
|
|
Since all reserved variables are `UPPERCASE`, the safest way is to only
|
|
|
|
use `lowercase` variable names. This is true for reading user input,
|
|
|
|
loop counting variables, etc., \... (in the example: `file`)
|
|
|
|
|
|
|
|
- prefer `lowercase` variables
|
|
|
|
- if you use `UPPERCASE` names, **do not use reserved variable names**
|
|
|
|
(see
|
|
|
|
[SUS](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08)
|
|
|
|
for an incomplete list)
|
|
|
|
- if you use `UPPERCASE` names, prepend the name with a unique prefix
|
|
|
|
(`MY_` in the example below)
|
|
|
|
|
|
|
|
```{=html}
|
|
|
|
<!-- -->
|
|
|
|
```
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# the prefix 'MY_'
|
|
|
|
MY_LOG_DIRECTORY=/var/adm/
|
|
|
|
|
|
|
|
for file in "$MY_LOG_DIRECTORY"/*; do
|
|
|
|
echo "Found Logfile: $file"
|
|
|
|
done
|
|
|
|
|
|
|
|
### Variable initialization
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
As in C, it's always a good idea to initialize your variables, though,
|
2023-07-05 11:31:29 +02:00
|
|
|
the shell will initialize fresh variables itself (better: Unset
|
|
|
|
variables will generally behave like variables containing a null
|
|
|
|
string).
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
It's no problem to pass an **environment variable** to the script. If
|
2023-07-05 11:31:29 +02:00
|
|
|
you blindly assume that all variables you use for the first time are
|
|
|
|
**empty**, anybody can **inject** content into a variable by passing it
|
|
|
|
via the environment.
|
|
|
|
|
|
|
|
The solution is simple and effective: **Initialize them**
|
|
|
|
|
|
|
|
my_input=""
|
|
|
|
my_array=()
|
|
|
|
my_number=0
|
|
|
|
|
|
|
|
If you do that for every variable you use, then you also have some
|
|
|
|
in-code documentation for them.
|
|
|
|
|
|
|
|
### Parameter expansion
|
|
|
|
|
|
|
|
Unless you are really sure what you\'re doing, **quote every parameter
|
|
|
|
expansion**.
|
|
|
|
|
2024-03-30 20:09:26 +01:00
|
|
|
There are some cases where this isn't needed from a technical point of
|
2023-07-05 11:31:29 +02:00
|
|
|
view, e.g.
|
|
|
|
|
|
|
|
- inside `[[ ... ]]` (other than the RHS of the `==`, `!=`, and `=~`
|
|
|
|
operators)
|
|
|
|
- the parameter (`WORD`) in `case $WORD in ....`
|
|
|
|
- variable assignment: `VAR=$WORD`
|
|
|
|
|
|
|
|
But quoting these is never a mistake. If you quote every parameter
|
|
|
|
expansion, you\'ll be safe.
|
|
|
|
|
2024-03-30 20:09:26 +01:00
|
|
|
If you need to parse a parameter as a list of words, you can't quote,
|
2023-07-05 11:31:29 +02:00
|
|
|
of course, e.g.
|
|
|
|
|
|
|
|
list="one two three"
|
|
|
|
|
|
|
|
# you MUST NOT quote $list here
|
|
|
|
for word in $list; do
|
|
|
|
...
|
|
|
|
done
|
|
|
|
|
|
|
|
### Function names
|
|
|
|
|
|
|
|
Function names should be all `lowercase` and meaningful. The function
|
|
|
|
names should be human readable. A function named `f1` may be easy and
|
|
|
|
quick to write down, but for debugging and especially for other people,
|
|
|
|
it reveals nothing. Good names help document your code without using
|
|
|
|
extra comments.
|
|
|
|
|
|
|
|
**do not use command names for your functions**. e.g. naming a script or
|
|
|
|
function `test`, will collide with the UNIX `test` command.
|
|
|
|
|
|
|
|
Unless absolutely necessary, only use alphanumeric characters and the
|
|
|
|
underscore for function names. `/bin/ls` is a valid function name in
|
|
|
|
Bash, but is not a good idea.
|
|
|
|
|
|
|
|
### Command substitution
|
|
|
|
|
|
|
|
As noted in [the article about command
|
2024-01-29 02:01:50 +01:00
|
|
|
substitution](../syntax/expansion/cmdsubst.md), you should use the `$( ... )`
|
2023-07-05 11:31:29 +02:00
|
|
|
form.
|
|
|
|
|
|
|
|
If portability is a concern, use the backquoted form `` ` ... ` ``.
|
|
|
|
|
|
|
|
In any case, if other expansions and word splitting are not wanted, you
|
|
|
|
should quote the command substitution!
|
|
|
|
|
|
|
|
### Eval
|
|
|
|
|
|
|
|
Well, like Greg says: **\"If eval is the answer, surely you are asking
|
|
|
|
the wrong question.\"**
|
|
|
|
|
|
|
|
Avoid it, unless absolutely neccesary:
|
|
|
|
|
|
|
|
- `eval` can be your neckshot
|
|
|
|
- there are most likely other ways to achieve what you want
|
|
|
|
- if possible, re-think the way your script works, if it seems you
|
2024-03-30 20:09:26 +01:00
|
|
|
can't avoid `eval` with your current method
|
2023-07-05 11:31:29 +02:00
|
|
|
- if you really, really, have to use it, then take care, and be sure
|
|
|
|
about what you\'re doing
|
|
|
|
|
|
|
|
## Basic structure
|
|
|
|
|
|
|
|
The basic structure of a script simply reads:
|
|
|
|
|
|
|
|
#!SHEBANG
|
|
|
|
|
|
|
|
CONFIGURATION_VARIABLES
|
|
|
|
|
|
|
|
FUNCTION_DEFINITIONS
|
|
|
|
|
|
|
|
MAIN_CODE
|
|
|
|
|
|
|
|
### The shebang
|
|
|
|
|
2024-04-02 23:19:23 +02:00
|
|
|
If possible (I know it's not always possible!), use a
|
|
|
|
[shebang](../dict/interpreter_directive.md).
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
Be careful with `/bin/sh`: The argument that \"on Linux `/bin/sh` is
|
|
|
|
Bash\" **is a lie** (and technically irrelevant)
|
|
|
|
|
|
|
|
The shebang serves two purposes for me:
|
|
|
|
|
|
|
|
- it specifies the interpreter to be used when the script file is
|
|
|
|
called directly: If you code for Bash, specify `bash`!
|
|
|
|
- it documents the desired interpreter (so: use `bash` when you write
|
|
|
|
a Bash-script, use `sh` when you write a general Bourne/POSIX
|
|
|
|
script, \...)
|
|
|
|
|
|
|
|
### Configuration variables
|
|
|
|
|
|
|
|
I call variables that are meant to be changed by the user
|
|
|
|
\"configuration variables\" here.
|
|
|
|
|
|
|
|
Make them easy to find (directly at the top of the script), give them
|
|
|
|
meaningful names and maybe a short comment. As noted above, use
|
|
|
|
`UPPERCASE` for them only when you\'re sure about what you\'re doing.
|
|
|
|
`lowercase` will be the safest.
|
|
|
|
|
|
|
|
### Function definitions
|
|
|
|
|
|
|
|
Unless there are reasons not to, all function definitions should be
|
|
|
|
declared before the main script code runs. This gives a far better
|
|
|
|
overview and ensures that all function names are known before they are
|
|
|
|
used.
|
|
|
|
|
2024-03-30 20:09:26 +01:00
|
|
|
Since a function isn't parsed before it is executed, you usually don't
|
2023-07-05 11:31:29 +02:00
|
|
|
have to ensure they\'re in a specific order.
|
|
|
|
|
|
|
|
The portable form of the function definition should be used, without the
|
|
|
|
`function` keyword (here using the [grouping compound
|
2024-01-29 02:01:50 +01:00
|
|
|
command](../syntax/ccmd/grouping_plain.md)):
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
getargs() {
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
Speaking about the command grouping in function definitions using
|
2024-03-30 20:09:26 +01:00
|
|
|
`{ ...; }`: If you don't have a good reason to use another compound
|
2023-07-05 11:31:29 +02:00
|
|
|
command directly, you should always use this one.
|
|
|
|
|
|
|
|
## Behaviour and robustness
|
|
|
|
|
|
|
|
### Fail early
|
|
|
|
|
|
|
|
**Fail early**, this sounds bad, but usually is good. Failing early
|
|
|
|
means to error out as early as possible when checks indicate an error or
|
|
|
|
unmet condition. Failing early means to error out **before** your script
|
|
|
|
begins its work in a potentially broken state.
|
|
|
|
|
|
|
|
### Availability of commands
|
|
|
|
|
|
|
|
If you use external commands that may not be present on the path, or not
|
|
|
|
installed, check for their availability, then tell the user they\'re
|
|
|
|
missing.
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
|
|
|
my_needed_commands="sed awk lsof who"
|
|
|
|
|
|
|
|
missing_counter=0
|
|
|
|
for needed_command in $my_needed_commands; do
|
|
|
|
if ! hash "$needed_command" >/dev/null 2>&1; then
|
|
|
|
printf "Command not found in PATH: %s\n" "$needed_command" >&2
|
|
|
|
((missing_counter++))
|
|
|
|
fi
|
|
|
|
done
|
|
|
|
|
|
|
|
if ((missing_counter > 0)); then
|
|
|
|
printf "Minimum %d commands are missing in PATH, aborting\n" "$missing_counter" >&2
|
|
|
|
exit 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
### Exit meaningfully
|
|
|
|
|
2024-04-02 23:19:23 +02:00
|
|
|
The [exit code](../dict/exit_status.md) is your only way to directly
|
2023-07-05 11:31:29 +02:00
|
|
|
communicate with the calling process without any special provisions.
|
|
|
|
|
|
|
|
If your script exits, provide a meaningful exit code. That minimally
|
|
|
|
means:
|
|
|
|
|
|
|
|
- `exit 0` (zero) if everything is okay
|
|
|
|
- `exit 1` - in general non-zero - if there was an error
|
|
|
|
|
|
|
|
This, **and only this**, will enable the calling component to check the
|
|
|
|
operation status of your script.
|
|
|
|
|
|
|
|
You know: **\"One of the main causes of the fall of the Roman Empire was
|
|
|
|
that, lacking zero, they had no way to indicate successful termination
|
2024-04-01 06:10:32 +02:00
|
|
|
of their C programs.\"** *-- Robert Firth*
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
## Misc
|
|
|
|
|
|
|
|
### Output and appearance
|
|
|
|
|
|
|
|
- if the script is interactive, if it works for you and if you think
|
|
|
|
this is a nice feature, you can try to [save the terminal content
|
2024-01-29 02:01:50 +01:00
|
|
|
and restore it](../snipplets/screen_saverestore.md) after execution
|
2023-07-05 11:31:29 +02:00
|
|
|
- output clean and understandable screen messages
|
|
|
|
- if applicable, you can use colors or specific prefixes to tag error
|
|
|
|
and warning messages
|
|
|
|
- make it easy for the user to identify those messages
|
|
|
|
- write normal output to `STDOUT`. write error, warning and diagnostic
|
|
|
|
messages to `STDERR`
|
|
|
|
- enables message filtering
|
|
|
|
- keeps the script from mixing output data with diagnostic, or
|
|
|
|
error messages
|
|
|
|
- if the script gives syntax help (`-?` or `-h` or `--help`
|
|
|
|
arguments), it should go to `STDOUT`
|
|
|
|
- if applicable, write error/diagnostic messages to a logfile
|
|
|
|
- avoids screen clutter
|
|
|
|
- messages are available for diagnostic use
|
|
|
|
|
|
|
|
### Input
|
|
|
|
|
|
|
|
- never blindly assume anything. If you want the user to input a
|
|
|
|
number, **check for numeric input, leading zeros**, etc. If you have
|
|
|
|
specific format or content needs, **always validate the input!**
|
|
|
|
|
|
|
|
### Tooling
|
|
|
|
|
|
|
|
- some of these guidelines, such as indentation, positioning of
|
|
|
|
\"body-introducing\" keywords, and portable function declarations,
|
|
|
|
can be enforced by [shfmt](https://github.com/mvdan/sh)
|