2024-04-02 21:19:20 +02:00
|
|
|
---
|
|
|
|
tags:
|
|
|
|
- bash
|
|
|
|
- shell
|
|
|
|
- scripting
|
|
|
|
- processes
|
|
|
|
- pipes
|
|
|
|
- variables
|
|
|
|
- environment
|
|
|
|
---
|
2023-07-05 11:31:29 +02:00
|
|
|
|
2024-04-02 21:19:20 +02:00
|
|
|
# Bash and the process tree
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
## The process tree
|
|
|
|
|
|
|
|
The processes in UNIX(r) are - unlike other systems - **organized as a
|
|
|
|
tree**. Every process has a parent process that started, or is
|
|
|
|
responsible, for it. Every process has its own **context memory** (Not
|
|
|
|
the memory where the process stores its data, rather, the memory where
|
2024-03-30 20:09:26 +01:00
|
|
|
data is stored that doesn't directly belong to the process, but is
|
2024-10-08 06:00:17 +02:00
|
|
|
needed to run the process) i.e. <u>**The environment**</u>.
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
Every process has its **own** environment space.
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
The environment stores, among other things, data that's useful to us,
|
2023-07-05 11:31:29 +02:00
|
|
|
the **environment variables**. These are strings in common `NAME=VALUE`
|
|
|
|
form, but they are not related to shell variables. A variable named
|
|
|
|
`LANG`, for example, is used by every program that looks it up in its
|
|
|
|
environment to determinate the current locale.
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>**Attention:**</u> A variable that is set, like with
|
2023-07-05 11:31:29 +02:00
|
|
|
`MYVAR=Hello`, is **not** automatically part of the environment. You
|
|
|
|
need to put it into the environment with the bash builtin command
|
|
|
|
`export`:
|
|
|
|
|
|
|
|
export MYVAR
|
|
|
|
|
2024-01-29 02:01:50 +01:00
|
|
|
Common system variables like [PATH](../syntax/shellvars.md#PATH) or
|
|
|
|
[HOME](../syntax/shellvars.md#HOME) are usually part of the environment (as
|
2023-07-05 11:31:29 +02:00
|
|
|
set by login scripts or programs).
|
|
|
|
|
|
|
|
## Executing programs
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
All the diagrams of the process tree use names like "`xterm`" or
|
|
|
|
"`bash`", but that's just to make it easier to understand what's
|
2024-03-30 20:09:26 +01:00
|
|
|
going on, it doesn't mean those processes are actually executed.
|
2023-07-05 11:31:29 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
Let's take a short look at what happens when you "execute a program"
|
|
|
|
from the Bash prompt, a program like "ls":
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
$ ls
|
|
|
|
|
|
|
|
Bash will now perform **two steps**:
|
|
|
|
|
|
|
|
- It will make a copy of itself
|
2024-10-08 06:00:17 +02:00
|
|
|
- The copy will replace itself with the "ls" program
|
2023-07-05 11:31:29 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
The copy of Bash will inherit the environment from the "main Bash"
|
2023-07-05 11:31:29 +02:00
|
|
|
process: All environment variables will also be copied to the new
|
|
|
|
process. This step is called **forking**.
|
|
|
|
|
|
|
|
For a short moment, you have a process tree that might look like
|
2024-10-08 06:00:17 +02:00
|
|
|
this...
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
xterm ----- bash ----- bash(copy)
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
...and after the "second Bash" (the copy) replaces itself with the
|
2023-07-05 11:31:29 +02:00
|
|
|
`ls` program (the copy execs it), it might look like
|
|
|
|
|
|
|
|
xterm ----- bash ----- ls
|
|
|
|
|
|
|
|
If everything was okay, the two steps resulted in one program being run.
|
|
|
|
The copy of the environment from the first step (forking) becomes the
|
|
|
|
environment for the final running program (in this case, `ls`).
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>**What is so important about it?**</u> In our example, what
|
2024-03-30 20:09:26 +01:00
|
|
|
the program `ls` does inside its own environment, it can't affect the
|
2023-07-05 11:31:29 +02:00
|
|
|
environment of its parent process (in this case, `bash`). The
|
2024-10-08 06:00:17 +02:00
|
|
|
environment was copied when ls was executed. Nothing is "copied back"
|
2023-07-05 11:31:29 +02:00
|
|
|
to the parent environment when `ls` terminates.
|
|
|
|
|
|
|
|
## Bash playing with pipes
|
|
|
|
|
|
|
|
Pipes are a very powerful tool. You can connect the output of one
|
2024-03-30 20:09:26 +01:00
|
|
|
process to the input of another process. We won't delve into piping at
|
2023-07-05 11:31:29 +02:00
|
|
|
this point, we just want to see how it looks in the process tree. Again,
|
2024-10-08 06:00:17 +02:00
|
|
|
we execute some commands, this time, we'll run `ls` and `grep`:
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
$ ls | grep myfile
|
|
|
|
|
|
|
|
It results in a tree like this:
|
|
|
|
|
|
|
|
+-- ls
|
|
|
|
xterm ----- bash --|
|
|
|
|
+-- grep
|
|
|
|
|
2024-03-30 20:09:26 +01:00
|
|
|
Note once again, `ls` can't influence the `grep` environment, `grep`
|
|
|
|
can't influence the `ls` environment, and neither `grep` nor `ls` can
|
2023-07-05 11:31:29 +02:00
|
|
|
influence the `bash` environment.
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>**How is that related to shell programming?!?**</u>
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
Well, imagine some Bash code that reads data from a pipe. For example,
|
|
|
|
the internal command `read`, which reads data from *stdin* and puts it
|
|
|
|
into a variable. We run it in a loop here to count input lines:
|
|
|
|
|
|
|
|
counter=0
|
|
|
|
|
|
|
|
cat /etc/passwd | while read; do ((counter++)); done
|
|
|
|
echo "Lines: $counter"
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
What? It's 0? Yes! The number of lines might not be 0, but the variable
|
2023-07-05 11:31:29 +02:00
|
|
|
`$counter` still is 0. Why? Remember the diagram from above? Rewriting
|
|
|
|
it a bit, we have:
|
|
|
|
|
|
|
|
+-- cat /etc/passwd
|
|
|
|
xterm ----- bash --|
|
|
|
|
+-- bash (while read; do ((counter++)); done)
|
|
|
|
|
|
|
|
See the relationship? The forked Bash process will count the lines like
|
|
|
|
a charm. It will also set the variable `counter` as directed. But if
|
|
|
|
everything ends, this extra process will be terminated - **your
|
2024-10-08 06:00:17 +02:00
|
|
|
"counter" variable is gone.** You see a 0 because in the main shell it
|
2024-03-30 20:09:26 +01:00
|
|
|
was 0, and wasn't changed by the child process!
|
2023-07-05 11:31:29 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>**So, how do we count the lines?**</u> Easy: **Avoid the
|
2024-03-30 20:09:26 +01:00
|
|
|
subshell.** The details don't matter, the important thing is the shell
|
2024-10-08 06:00:17 +02:00
|
|
|
that sets the counter must be the "main shell". For example:
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
counter=0
|
|
|
|
|
|
|
|
while read; do ((counter++)); done </etc/passwd
|
|
|
|
echo "Lines: $counter"
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
It's nearly self-explanatory. The `while` loop runs in the **current
|
2023-07-05 11:31:29 +02:00
|
|
|
shell**, the counter is incremented in the **current shell**, everything
|
|
|
|
vital happens in the **current shell**, also the `read` command sets the
|
2024-03-30 20:09:26 +01:00
|
|
|
variable `REPLY` (the default if nothing is given), though we don't use
|
2023-07-05 11:31:29 +02:00
|
|
|
it here.
|
|
|
|
|
|
|
|
## Actions that create a subshell
|
|
|
|
|
|
|
|
Bash creates **subshells** or **subprocesses** on various actions it
|
|
|
|
performs:
|
|
|
|
|
|
|
|
### Executing commands
|
|
|
|
|
|
|
|
As shown above, Bash will create subprocesses everytime it executes
|
2024-03-30 19:22:45 +01:00
|
|
|
commands. That's nothing new.
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
But if your command is a subprocess that sets variables you want to use
|
2024-03-30 20:09:26 +01:00
|
|
|
in your main script, that won't work.
|
2023-07-05 11:31:29 +02:00
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
For exactly this purpose, there's the `source` command (also: the *dot*
|
2024-03-30 20:09:26 +01:00
|
|
|
`.` command). Source doesn't execute the script, it imports the other
|
2024-03-30 19:22:45 +01:00
|
|
|
script's code into the current shell:
|
2023-07-05 11:31:29 +02:00
|
|
|
|
|
|
|
source ./myvariables.sh
|
|
|
|
# equivalent to:
|
|
|
|
. ./myvariables.sh
|
|
|
|
|
|
|
|
### Pipes
|
|
|
|
|
|
|
|
The last big section was about pipes, so no example here.
|
|
|
|
|
|
|
|
### Explicit subshell
|
|
|
|
|
|
|
|
If you group commands by enclosing them in parentheses, these commands
|
|
|
|
are run inside a subshell:
|
|
|
|
|
|
|
|
(echo PASSWD follows; cat /etc/passwd; echo GROUP follows; cat /etc/group) >output.txt
|
|
|
|
|
|
|
|
### Command substitution
|
|
|
|
|
2024-01-29 02:01:50 +01:00
|
|
|
With [command substitution](../syntax/expansion/cmdsubst.md) you re-use the
|
2023-07-05 11:31:29 +02:00
|
|
|
output of another command as text in your command line, for example to
|
|
|
|
set a variable. The other command is run in a subshell:
|
|
|
|
|
|
|
|
number_of_users=$(cat /etc/passwd | wc -l)
|
|
|
|
|
|
|
|
Note that, in this example, a second subshell was created by using a
|
|
|
|
pipe in the command substitution:
|
|
|
|
|
|
|
|
+-- cat /etc/passwd
|
|
|
|
xterm ----- bash ----- bash (cmd. subst.) --|
|
|
|
|
+-- wc -l
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
!!! warning "FIXME"
|
|
|
|
to be continued
|