bash-hackers-wiki/docs/scripting/processtree.md

190 lines
6.3 KiB
Markdown
Raw Normal View History

2024-04-02 21:19:20 +02:00
---
tags:
- bash
- shell
- scripting
- processes
- pipes
- variables
- environment
---
2023-07-05 11:31:29 +02:00
2024-04-02 21:19:20 +02:00
# Bash and the process tree
2023-07-05 11:31:29 +02:00
## The process tree
The processes in UNIX(r) are - unlike other systems - **organized as a
tree**. Every process has a parent process that started, or is
responsible, for it. Every process has its own **context memory** (Not
the memory where the process stores its data, rather, the memory where
2024-03-30 20:09:26 +01:00
data is stored that doesn't directly belong to the process, but is
needed to run the process) i.e. <u>**The environment**</u>.
2023-07-05 11:31:29 +02:00
Every process has its **own** environment space.
The environment stores, among other things, data that's useful to us,
2023-07-05 11:31:29 +02:00
the **environment variables**. These are strings in common `NAME=VALUE`
form, but they are not related to shell variables. A variable named
`LANG`, for example, is used by every program that looks it up in its
environment to determinate the current locale.
<u>**Attention:**</u> A variable that is set, like with
2023-07-05 11:31:29 +02:00
`MYVAR=Hello`, is **not** automatically part of the environment. You
need to put it into the environment with the bash builtin command
`export`:
export MYVAR
Common system variables like [PATH](../syntax/shellvars.md#PATH) or
[HOME](../syntax/shellvars.md#HOME) are usually part of the environment (as
2023-07-05 11:31:29 +02:00
set by login scripts or programs).
## Executing programs
All the diagrams of the process tree use names like "`xterm`" or
"`bash`", but that's just to make it easier to understand what's
2024-03-30 20:09:26 +01:00
going on, it doesn't mean those processes are actually executed.
2023-07-05 11:31:29 +02:00
Let's take a short look at what happens when you "execute a program"
from the Bash prompt, a program like "ls":
2023-07-05 11:31:29 +02:00
$ ls
Bash will now perform **two steps**:
- It will make a copy of itself
- The copy will replace itself with the "ls" program
2023-07-05 11:31:29 +02:00
The copy of Bash will inherit the environment from the "main Bash"
2023-07-05 11:31:29 +02:00
process: All environment variables will also be copied to the new
process. This step is called **forking**.
For a short moment, you have a process tree that might look like
this...
2023-07-05 11:31:29 +02:00
xterm ----- bash ----- bash(copy)
...and after the "second Bash" (the copy) replaces itself with the
2023-07-05 11:31:29 +02:00
`ls` program (the copy execs it), it might look like
xterm ----- bash ----- ls
If everything was okay, the two steps resulted in one program being run.
The copy of the environment from the first step (forking) becomes the
environment for the final running program (in this case, `ls`).
<u>**What is so important about it?**</u> In our example, what
2024-03-30 20:09:26 +01:00
the program `ls` does inside its own environment, it can't affect the
2023-07-05 11:31:29 +02:00
environment of its parent process (in this case, `bash`). The
environment was copied when ls was executed. Nothing is "copied back"
2023-07-05 11:31:29 +02:00
to the parent environment when `ls` terminates.
## Bash playing with pipes
Pipes are a very powerful tool. You can connect the output of one
2024-03-30 20:09:26 +01:00
process to the input of another process. We won't delve into piping at
2023-07-05 11:31:29 +02:00
this point, we just want to see how it looks in the process tree. Again,
we execute some commands, this time, we'll run `ls` and `grep`:
2023-07-05 11:31:29 +02:00
$ ls | grep myfile
It results in a tree like this:
+-- ls
xterm ----- bash --|
+-- grep
2024-03-30 20:09:26 +01:00
Note once again, `ls` can't influence the `grep` environment, `grep`
can't influence the `ls` environment, and neither `grep` nor `ls` can
2023-07-05 11:31:29 +02:00
influence the `bash` environment.
<u>**How is that related to shell programming?!?**</u>
2023-07-05 11:31:29 +02:00
Well, imagine some Bash code that reads data from a pipe. For example,
the internal command `read`, which reads data from *stdin* and puts it
into a variable. We run it in a loop here to count input lines:
counter=0
cat /etc/passwd | while read; do ((counter++)); done
echo "Lines: $counter"
What? It's 0? Yes! The number of lines might not be 0, but the variable
2023-07-05 11:31:29 +02:00
`$counter` still is 0. Why? Remember the diagram from above? Rewriting
it a bit, we have:
+-- cat /etc/passwd
xterm ----- bash --|
+-- bash (while read; do ((counter++)); done)
See the relationship? The forked Bash process will count the lines like
a charm. It will also set the variable `counter` as directed. But if
everything ends, this extra process will be terminated - **your
"counter" variable is gone.** You see a 0 because in the main shell it
2024-03-30 20:09:26 +01:00
was 0, and wasn't changed by the child process!
2023-07-05 11:31:29 +02:00
<u>**So, how do we count the lines?**</u> Easy: **Avoid the
2024-03-30 20:09:26 +01:00
subshell.** The details don't matter, the important thing is the shell
that sets the counter must be the "main shell". For example:
2023-07-05 11:31:29 +02:00
counter=0
while read; do ((counter++)); done </etc/passwd
echo "Lines: $counter"
It's nearly self-explanatory. The `while` loop runs in the **current
2023-07-05 11:31:29 +02:00
shell**, the counter is incremented in the **current shell**, everything
vital happens in the **current shell**, also the `read` command sets the
2024-03-30 20:09:26 +01:00
variable `REPLY` (the default if nothing is given), though we don't use
2023-07-05 11:31:29 +02:00
it here.
## Actions that create a subshell
Bash creates **subshells** or **subprocesses** on various actions it
performs:
### Executing commands
As shown above, Bash will create subprocesses everytime it executes
commands. That's nothing new.
2023-07-05 11:31:29 +02:00
But if your command is a subprocess that sets variables you want to use
2024-03-30 20:09:26 +01:00
in your main script, that won't work.
2023-07-05 11:31:29 +02:00
For exactly this purpose, there's the `source` command (also: the *dot*
2024-03-30 20:09:26 +01:00
`.` command). Source doesn't execute the script, it imports the other
script's code into the current shell:
2023-07-05 11:31:29 +02:00
source ./myvariables.sh
# equivalent to:
. ./myvariables.sh
### Pipes
The last big section was about pipes, so no example here.
### Explicit subshell
If you group commands by enclosing them in parentheses, these commands
are run inside a subshell:
(echo PASSWD follows; cat /etc/passwd; echo GROUP follows; cat /etc/group) >output.txt
### Command substitution
With [command substitution](../syntax/expansion/cmdsubst.md) you re-use the
2023-07-05 11:31:29 +02:00
output of another command as text in your command line, for example to
set a variable. The other command is run in a subshell:
number_of_users=$(cat /etc/passwd | wc -l)
Note that, in this example, a second subshell was created by using a
pipe in the command substitution:
+-- cat /etc/passwd
xterm ----- bash ----- bash (cmd. subst.) --|
+-- wc -l
!!! warning "FIXME"
to be continued