bash-hackers-wiki/docs/syntax/words.md

179 lines
6.6 KiB
Markdown
Raw Normal View History

2024-04-02 21:19:20 +02:00
---
tags:
- bash
- shell
- scripting
- token
- words
- split
- splitting
- recognition
---
2023-07-05 11:43:35 +02:00
# Words...
2023-07-05 11:43:35 +02:00
!!! warning "FIXME"
This article needs a review, it covers two topics (command line
splitting and word splitting) and mixes both a bit too much. But in
general, it's still usable to help understand this behaviour, it's
"wrong but not wrong".
2023-07-05 11:43:35 +02:00
One fundamental principle of Bash is to recognize words entered at the
command prompt, or under other circumstances like variable-expansion.
## Splitting the commandline
Bash scans the command line and splits it into words, usually to put the
parameters you enter for a command into the right C-memory (the `argv`
vector) to later correctly call the command. These words are recognized
by splitting the command line at the special character position,
**Space** or **Tab** (the manual defines them as **blanks**). For
example, take the echo program. It displays all its parameters separated
by a space. When you enter an echo command at the Bash prompt, Bash will
look for those special characters, and use them to separate the
parameters.
You don't know what I'm talking about? I'm talking about this:
2023-07-05 11:43:35 +02:00
$ echo Hello little world
Hello little world
In other words, something you do (and Bash does) everyday. The
characters where Bash splits the command line (SPACE, TAB i.e. blanks)
are recognized as delimiters. There is no null argument generated when
you have 2 or more blanks in the command line. **A sequence of more
blank characters is treated as a single blank.** Here's an example:
2023-07-05 11:43:35 +02:00
$ echo Hello little world
Hello little world
Bash splits the command line at the blanks into words, then it calls
echo with **each word as an argument**. In this example, echo is called
with three arguments: "`Hello`", "`little`" and "`world`"!
2023-07-05 11:43:35 +02:00
<u>Does that mean we can't echo more than one Space?</u> Of
2023-07-05 11:43:35 +02:00
course not! Bash treats blanks as special characters, but there are two
ways to tell Bash not to treat them special: **Escaping** and
**quoting**.
Escaping a character means, to **take away its special meaning**. Bash
will use an escaped character as text, even if it's a special one.
2023-07-05 11:43:35 +02:00
Escaping is done by preceeding the character with a backslash:
$ echo Hello\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ little \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ world
Hello little world
None of the escaped spaces will be used to perform word splitting. Thus,
echo is called with one argument: "`Hello little world`".
2023-07-05 11:43:35 +02:00
Bash has a mechanism to "escape" an entire string: **Quoting**. In the
2024-03-30 20:09:26 +01:00
context of command-splitting, which this section is about, it doesn't
2023-07-05 11:43:35 +02:00
matter which kind of quoting you use: weak quoting or strong quoting,
both cause Bash to not treat spaces as special characters:
$ echo "Hello little world"
Hello little world
$ echo 'Hello little world'
Hello little world
<u>What is it all about now?</u> Well, for example imagine a
2023-07-05 11:43:35 +02:00
program that expects a filename as an argument, like cat. Filenames can
have spaces in them:
$ ls -l
total 4
-rw-r--r-- 1 bonsai bonsai 5 Apr 18 18:16 test file
$ cat test file
cat: test: No such file or directory
cat: file: No such file or directory
$ cat test\ file
m00!
$ cat "test file"
m00!
If you enter that on the command line with Tab completion, that will
take care of the spaces. But Bash also does another type of splitting.
## Word splitting
For a more technical description, please read the [article about word
splitting](../syntax/expansion/wordsplit.md)!
2023-07-05 11:43:35 +02:00
The first kind of splitting is done to parse the command line into
separate tokens. This is what was described above, it's a pure
2023-07-05 11:43:35 +02:00
**command line parsing**.
After the command line has been split into words, Bash will perform
expansion, if needed - variables that occur in the command line need to
be expanded (substituted by their value), for example. This is where the
second type of word splitting comes in - several expansions undergo
**word splitting** (but others do not).
Imagine you have a filename stored in a variable:
MYFILE="test file"
When this variable is used, its occurance will be replaced by its
content.
$ cat $MYFILE
cat: test: No such file or directory
cat: file: No such file or directory
Though this is another step where spaces make things difficult,
**quoting** is used to work around the difficulty. Quotes also affect
word splitting:
$ cat "$MYFILE"
m00!
## Example
Let's follow an unquoted command through these steps, assuming that the
2023-07-05 11:43:35 +02:00
variable is set:
MYFILE="THE FILE.TXT"
and the first review is:
echo The file is named $MYFILE
The parser will scan for blanks and mark the relevant words ("splitting
the command line"):
2023-07-05 11:43:35 +02:00
| Initial command line splitting: | | | | | |
|---------------------------------|--------|--------|--------|---------|-----------|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 |
| `echo` | `The` | `file` | `is` | `named` | `$MYFILE` |
2023-07-05 11:43:35 +02:00
A [parameter/variable expansion](../syntax/pe.md) is part of that command
2023-07-05 11:43:35 +02:00
line, Bash will perform the substitution, and the [word
splitting](../syntax/expansion/wordsplit.md) on the results:
2023-07-05 11:43:35 +02:00
| Word splitting after substitution: | | | | | | |
|------------------------------------|--------|--------|--------|---------|--------|------------|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 | Word 7 |
| `echo` | `The` | `file` | `is` | `named` | `THE` | `FILE.TXT` |
2023-07-05 11:43:35 +02:00
Now let's imagine we quoted `$MYFILE`, the command line now looks like:
2023-07-05 11:43:35 +02:00
echo The file is named "$MYFILE"
| Word splitting after substitution (quoted!): | | | | | |
|----------------------------------------------|--------|--------|--------|---------|----------------|
| Word 1 | Word 2 | Word 3 | Word 4 | Word 5 | Word 6 |
| `echo` | `The` | `file` | `is` | `named` | `THE FILE.TXT` |
2023-07-05 11:43:35 +02:00
# See also
2023-07-05 11:43:35 +02:00
- Internal: [Quoting and character escaping](../syntax/quoting.md)
- Internal: [Word splitting](../syntax/expansion/wordsplit.md)
2023-07-05 11:43:35 +02:00
- Internal: [Introduction to expansions and
substitutions](../syntax/expansion/intro.md)
2023-07-05 11:43:35 +02:00
- External: [Grymore:
Shellquoting](http://www.grymoire.com/Unix/Quote.html)