2024-04-02 21:19:20 +02:00
|
|
|
---
|
|
|
|
tags:
|
|
|
|
- bash
|
|
|
|
- shell
|
|
|
|
- scripting
|
|
|
|
- token
|
|
|
|
- words
|
|
|
|
- split
|
|
|
|
- splitting
|
|
|
|
- recognition
|
|
|
|
---
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
# Words...
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
!!! warning "FIXME"
|
|
|
|
This article needs a review, it covers two topics (command line
|
|
|
|
splitting and word splitting) and mixes both a bit too much. But in
|
|
|
|
general, it's still usable to help understand this behaviour, it's
|
|
|
|
"wrong but not wrong".
|
2023-07-05 11:43:35 +02:00
|
|
|
|
|
|
|
One fundamental principle of Bash is to recognize words entered at the
|
|
|
|
command prompt, or under other circumstances like variable-expansion.
|
|
|
|
|
|
|
|
## Splitting the commandline
|
|
|
|
|
|
|
|
Bash scans the command line and splits it into words, usually to put the
|
|
|
|
parameters you enter for a command into the right C-memory (the `argv`
|
|
|
|
vector) to later correctly call the command. These words are recognized
|
|
|
|
by splitting the command line at the special character position,
|
|
|
|
**Space** or **Tab** (the manual defines them as **blanks**). For
|
|
|
|
example, take the echo program. It displays all its parameters separated
|
|
|
|
by a space. When you enter an echo command at the Bash prompt, Bash will
|
|
|
|
look for those special characters, and use them to separate the
|
|
|
|
parameters.
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
You don't know what I'm talking about? I'm talking about this:
|
2023-07-05 11:43:35 +02:00
|
|
|
|
|
|
|
$ echo Hello little world
|
|
|
|
Hello little world
|
|
|
|
|
|
|
|
In other words, something you do (and Bash does) everyday. The
|
|
|
|
characters where Bash splits the command line (SPACE, TAB i.e. blanks)
|
|
|
|
are recognized as delimiters. There is no null argument generated when
|
|
|
|
you have 2 or more blanks in the command line. **A sequence of more
|
2024-03-30 19:22:45 +01:00
|
|
|
blank characters is treated as a single blank.** Here's an example:
|
2023-07-05 11:43:35 +02:00
|
|
|
|
|
|
|
$ echo Hello little world
|
|
|
|
Hello little world
|
|
|
|
|
|
|
|
Bash splits the command line at the blanks into words, then it calls
|
|
|
|
echo with **each word as an argument**. In this example, echo is called
|
2024-10-08 06:00:17 +02:00
|
|
|
with three arguments: "`Hello`", "`little`" and "`world`"!
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>Does that mean we can't echo more than one Space?</u> Of
|
2023-07-05 11:43:35 +02:00
|
|
|
course not! Bash treats blanks as special characters, but there are two
|
|
|
|
ways to tell Bash not to treat them special: **Escaping** and
|
|
|
|
**quoting**.
|
|
|
|
|
|
|
|
Escaping a character means, to **take away its special meaning**. Bash
|
2024-03-30 19:22:45 +01:00
|
|
|
will use an escaped character as text, even if it's a special one.
|
2023-07-05 11:43:35 +02:00
|
|
|
Escaping is done by preceeding the character with a backslash:
|
|
|
|
|
|
|
|
$ echo Hello\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ little \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ world
|
|
|
|
Hello little world
|
|
|
|
|
|
|
|
None of the escaped spaces will be used to perform word splitting. Thus,
|
2024-10-08 06:00:17 +02:00
|
|
|
echo is called with one argument: "`Hello little world`".
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
Bash has a mechanism to "escape" an entire string: **Quoting**. In the
|
2024-03-30 20:09:26 +01:00
|
|
|
context of command-splitting, which this section is about, it doesn't
|
2023-07-05 11:43:35 +02:00
|
|
|
matter which kind of quoting you use: weak quoting or strong quoting,
|
|
|
|
both cause Bash to not treat spaces as special characters:
|
|
|
|
|
|
|
|
$ echo "Hello little world"
|
|
|
|
Hello little world
|
|
|
|
|
|
|
|
$ echo 'Hello little world'
|
|
|
|
Hello little world
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<u>What is it all about now?</u> Well, for example imagine a
|
2023-07-05 11:43:35 +02:00
|
|
|
program that expects a filename as an argument, like cat. Filenames can
|
|
|
|
have spaces in them:
|
|
|
|
|
|
|
|
$ ls -l
|
|
|
|
total 4
|
|
|
|
-rw-r--r-- 1 bonsai bonsai 5 Apr 18 18:16 test file
|
|
|
|
|
|
|
|
$ cat test file
|
|
|
|
cat: test: No such file or directory
|
|
|
|
cat: file: No such file or directory
|
|
|
|
|
|
|
|
$ cat test\ file
|
|
|
|
m00!
|
|
|
|
|
|
|
|
$ cat "test file"
|
|
|
|
m00!
|
|
|
|
|
|
|
|
If you enter that on the command line with Tab completion, that will
|
|
|
|
take care of the spaces. But Bash also does another type of splitting.
|
|
|
|
|
|
|
|
## Word splitting
|
|
|
|
|
|
|
|
For a more technical description, please read the [article about word
|
2024-01-29 02:01:50 +01:00
|
|
|
splitting](../syntax/expansion/wordsplit.md)!
|
2023-07-05 11:43:35 +02:00
|
|
|
|
|
|
|
The first kind of splitting is done to parse the command line into
|
2024-03-30 19:22:45 +01:00
|
|
|
separate tokens. This is what was described above, it's a pure
|
2023-07-05 11:43:35 +02:00
|
|
|
**command line parsing**.
|
|
|
|
|
|
|
|
After the command line has been split into words, Bash will perform
|
|
|
|
expansion, if needed - variables that occur in the command line need to
|
|
|
|
be expanded (substituted by their value), for example. This is where the
|
|
|
|
second type of word splitting comes in - several expansions undergo
|
|
|
|
**word splitting** (but others do not).
|
|
|
|
|
|
|
|
Imagine you have a filename stored in a variable:
|
|
|
|
|
|
|
|
MYFILE="test file"
|
|
|
|
|
|
|
|
When this variable is used, its occurance will be replaced by its
|
|
|
|
content.
|
|
|
|
|
|
|
|
$ cat $MYFILE
|
|
|
|
cat: test: No such file or directory
|
|
|
|
cat: file: No such file or directory
|
|
|
|
|
|
|
|
Though this is another step where spaces make things difficult,
|
|
|
|
**quoting** is used to work around the difficulty. Quotes also affect
|
|
|
|
word splitting:
|
|
|
|
|
|
|
|
$ cat "$MYFILE"
|
|
|
|
m00!
|
|
|
|
|
|
|
|
## Example
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
Let's follow an unquoted command through these steps, assuming that the
|
2023-07-05 11:43:35 +02:00
|
|
|
variable is set:
|
|
|
|
|
|
|
|
MYFILE="THE FILE.TXT"
|
|
|
|
|
|
|
|
and the first review is:
|
|
|
|
|
|
|
|
echo The file is named $MYFILE
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
The parser will scan for blanks and mark the relevant words ("splitting
|
|
|
|
the command line"):
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
| Initial command line splitting: | | | | | |
|
|
|
|
|---------------------------------|--------|--------|--------|---------|-----------|
|
|
|
|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 |
|
|
|
|
| `echo` | `The` | `file` | `is` | `named` | `$MYFILE` |
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-01-29 02:01:50 +01:00
|
|
|
A [parameter/variable expansion](../syntax/pe.md) is part of that command
|
2023-07-05 11:43:35 +02:00
|
|
|
line, Bash will perform the substitution, and the [word
|
2024-01-29 02:01:50 +01:00
|
|
|
splitting](../syntax/expansion/wordsplit.md) on the results:
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
| Word splitting after substitution: | | | | | | |
|
|
|
|
|------------------------------------|--------|--------|--------|---------|--------|------------|
|
|
|
|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 | Word 7 |
|
|
|
|
| `echo` | `The` | `file` | `is` | `named` | `THE` | `FILE.TXT` |
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
Now let's imagine we quoted `$MYFILE`, the command line now looks like:
|
2023-07-05 11:43:35 +02:00
|
|
|
|
|
|
|
echo The file is named "$MYFILE"
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
| Word splitting after substitution (quoted!): | | | | | |
|
|
|
|
|----------------------------------------------|--------|--------|--------|---------|----------------|
|
|
|
|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 |
|
|
|
|
| `echo` | `The` | `file` | `is` | `named` | `THE FILE.TXT` |
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
# See also
|
2023-07-05 11:43:35 +02:00
|
|
|
|
2024-01-29 02:01:50 +01:00
|
|
|
- Internal: [Quoting and character escaping](../syntax/quoting.md)
|
|
|
|
- Internal: [Word splitting](../syntax/expansion/wordsplit.md)
|
2023-07-05 11:43:35 +02:00
|
|
|
- Internal: [Introduction to expansions and
|
2024-01-29 02:01:50 +01:00
|
|
|
substitutions](../syntax/expansion/intro.md)
|
2023-07-05 11:43:35 +02:00
|
|
|
- External: [Grymore:
|
|
|
|
Shellquoting](http://www.grymoire.com/Unix/Quote.html)
|