2024-04-02 21:19:20 +02:00
|
|
|
---
|
|
|
|
tags:
|
|
|
|
- bash
|
|
|
|
- shell
|
|
|
|
- scripting
|
|
|
|
- mutex
|
|
|
|
- locking
|
|
|
|
- run-control
|
|
|
|
---
|
2023-07-05 11:10:03 +02:00
|
|
|
|
2024-04-02 21:19:20 +02:00
|
|
|
# Lock your script (against parallel execution)
|
2023-07-05 11:10:03 +02:00
|
|
|
|
|
|
|
## Why lock?
|
|
|
|
|
2024-03-30 19:22:45 +01:00
|
|
|
Sometimes there's a need to ensure only one copy of a script runs, i.e
|
2023-07-05 11:10:03 +02:00
|
|
|
prevent two or more copies running simultaneously. Imagine an important
|
|
|
|
cronjob doing something very important, which will fail or corrupt data
|
|
|
|
if two copies of the called program were to run at the same time. To
|
|
|
|
prevent this, a form of `MUTEX` (**mutual exclusion**) lock is needed.
|
|
|
|
|
|
|
|
The basic procedure is simple: The script checks if a specific condition
|
2024-03-30 19:22:45 +01:00
|
|
|
(locking) is present at startup, if yes, it's locked - the scipt
|
2024-03-30 20:09:26 +01:00
|
|
|
doesn't start.
|
2023-07-05 11:10:03 +02:00
|
|
|
|
|
|
|
This article describes locking with common UNIX(r) tools. There are
|
2024-10-08 06:00:17 +02:00
|
|
|
other special locking tools available, But they're not standardized, or
|
|
|
|
worse yet, you can't be sure they're present when you want to run your
|
2023-07-05 11:10:03 +02:00
|
|
|
scripts. **A tool designed for specifically for this purpose does the
|
|
|
|
job much better than general purpose code.**
|
|
|
|
|
|
|
|
### Other, special locking tools
|
|
|
|
|
|
|
|
As told above, a special tool for locking is the preferred solution.
|
|
|
|
Race conditions are avoided, as is the need to work around specific
|
|
|
|
limits.
|
|
|
|
|
|
|
|
- `flock`: <http://www.kernel.org/pub/software/utils/script/flock/>
|
|
|
|
- `solo`: <http://timkay.com/solo/>
|
|
|
|
|
|
|
|
## Choose the locking method
|
|
|
|
|
|
|
|
The best way to set a global lock condition is the UNIX(r) filesystem.
|
2024-03-30 20:09:26 +01:00
|
|
|
Variables aren't enough, as each process has its own private variable
|
2023-07-05 11:10:03 +02:00
|
|
|
space, but the filesystem is global to all processes (yes, I know about
|
2024-10-08 06:00:17 +02:00
|
|
|
chroots, namespaces, ... special case). You can "set" several things
|
2023-07-05 11:10:03 +02:00
|
|
|
in the filesystem that can be used as locking indicator:
|
|
|
|
|
|
|
|
- create files
|
|
|
|
- update file timestamps
|
|
|
|
- create directories
|
|
|
|
|
|
|
|
To create a file or set a file timestamp, usually the command touch is
|
|
|
|
used. The following problem is implied: A locking mechanism checks for
|
|
|
|
the existance of the lockfile, if no lockfile exists, it creates one and
|
2024-03-30 19:22:45 +01:00
|
|
|
continues. Those are **two separate steps**! That means it's **not an
|
|
|
|
atomic operation**. There's a small amount of time between checking and
|
2023-07-05 11:10:03 +02:00
|
|
|
creating, where another instance of the same script could perform
|
2024-03-30 20:09:26 +01:00
|
|
|
locking (because when it checked, the lockfile wasn't there)! In that
|
2023-07-05 11:10:03 +02:00
|
|
|
case you would have 2 instances of the script running, both thinking
|
|
|
|
they are succesfully locked, and can operate without colliding. Setting
|
|
|
|
the timestamp is similar: One step to check the timespamp, a second step
|
|
|
|
to set the timestamp.
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
<WRAP center round tip 60%> <u>**Conclusion:**</u> We need an
|
2024-04-01 06:10:32 +02:00
|
|
|
operation that does the check and the locking in one step. </WRAP>
|
2023-07-05 11:10:03 +02:00
|
|
|
|
|
|
|
A simple way to get that is to create a **lock directory** - with the
|
|
|
|
mkdir command. It will:
|
|
|
|
|
|
|
|
* create a given directory only if it does not exist, and set a successful exit code
|
|
|
|
* it will set an unsuccesful exit code if an error occours - for example, if the directory specified already exists
|
|
|
|
|
|
|
|
With mkdir it seems, we have our two steps in one simple operation. A
|
|
|
|
(very!) simple locking code might look like this:
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
if mkdir /var/lock/mylock; then
|
|
|
|
echo "Locking succeeded" >&2
|
|
|
|
else
|
|
|
|
echo "Lock failed - exit" >&2
|
|
|
|
exit 1
|
|
|
|
fi
|
|
|
|
```
|
|
|
|
|
|
|
|
In case `mkdir` reports an error, the script will exit at this point -
|
|
|
|
**the MUTEX did its job!**
|
|
|
|
|
|
|
|
*If the directory is removed after setting a successful lock, while the
|
|
|
|
script is still running, the lock is lost. Doing chmod -w for the parent
|
|
|
|
directory containing the lock directory can be done, but it is not
|
|
|
|
atomic. Maybe a while loop checking continously for the existence of the
|
|
|
|
lock in the background and sending a signal such as USR1, if the
|
|
|
|
directory is not found, can be done. The signal would need to be
|
|
|
|
trapped. I am sure there there is a better solution than this
|
2024-04-02 23:19:23 +02:00
|
|
|
suggestion* --- **sn18** 2009/12/19 08:24*
|
2023-07-05 11:10:03 +02:00
|
|
|
|
|
|
|
**Note:** While perusing the Internet, I found some people asking if the
|
2024-10-08 06:00:17 +02:00
|
|
|
`mkdir` method works "on all filesystems". Well, let's say it should.
|
2023-07-05 11:10:03 +02:00
|
|
|
The syscall under `mkdir` is guarenteed to work atomicly in all cases,
|
|
|
|
at least on Unices. Two examples of problems are NFS filesystems and
|
|
|
|
filesystems on cluster servers. With those two scenarios, dependencies
|
|
|
|
exist related to the mount options and implementation. However, I
|
|
|
|
successfully use this simple method on an Oracle OCFS2 filesystem in a
|
2024-10-08 06:00:17 +02:00
|
|
|
4-node cluster environment. So let's just say "it should work under
|
|
|
|
normal conditions".
|
2023-07-05 11:10:03 +02:00
|
|
|
|
|
|
|
Another atomic method is setting the `noclobber` shell option
|
|
|
|
(`set -C`). That will cause redirection to fail, if the file the
|
|
|
|
redirection points to already exists (using diverse `open()` methods).
|
|
|
|
Need to write a code example here.
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
|
|
|
|
if ( set -o noclobber; echo "locked" > "$lockfile") 2> /dev/null; then
|
|
|
|
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
|
|
|
|
echo "Locking succeeded" >&2
|
|
|
|
rm -f "$lockfile"
|
|
|
|
else
|
|
|
|
echo "Lock failed - exit" >&2
|
|
|
|
exit 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Another explanation of this basic pattern using `set -C` can be found
|
|
|
|
[here](http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_07).
|
|
|
|
|
|
|
|
## An example
|
|
|
|
|
|
|
|
This code was taken from a production grade script that controls PISG to
|
|
|
|
create statistical pages from my IRC logfiles. There are some
|
|
|
|
differences compared to the very simple example above:
|
|
|
|
|
|
|
|
- the locking stores the process ID of the locked instance
|
|
|
|
- if a lock fails, the script tries to find out if the locked instance
|
|
|
|
still is active (unreliable!)
|
|
|
|
- traps are created to automatically remove the lock when the script
|
|
|
|
terminates, or is killed
|
|
|
|
|
2024-03-30 20:09:26 +01:00
|
|
|
Details on how the script is killed aren't given, only code relevant to
|
2023-07-05 11:10:03 +02:00
|
|
|
the locking process is shown:
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# lock dirs/files
|
|
|
|
LOCKDIR="/tmp/statsgen-lock"
|
|
|
|
PIDFILE="${LOCKDIR}/PID"
|
|
|
|
|
|
|
|
# exit codes and text
|
|
|
|
ENO_SUCCESS=0; ETXT[0]="ENO_SUCCESS"
|
|
|
|
ENO_GENERAL=1; ETXT[1]="ENO_GENERAL"
|
|
|
|
ENO_LOCKFAIL=2; ETXT[2]="ENO_LOCKFAIL"
|
|
|
|
ENO_RECVSIG=3; ETXT[3]="ENO_RECVSIG"
|
|
|
|
|
|
|
|
###
|
|
|
|
### start locking attempt
|
|
|
|
###
|
|
|
|
|
|
|
|
trap 'ECODE=$?; echo "[statsgen] Exit: ${ETXT[ECODE]}($ECODE)" >&2' 0
|
|
|
|
echo -n "[statsgen] Locking: " >&2
|
|
|
|
|
|
|
|
if mkdir "${LOCKDIR}" &>/dev/null; then
|
|
|
|
|
2024-10-08 06:00:17 +02:00
|
|
|
# lock succeeded, install signal handlers before storing the PID just in case
|
2023-07-05 11:10:03 +02:00
|
|
|
# storing the PID fails
|
|
|
|
trap 'ECODE=$?;
|
|
|
|
echo "[statsgen] Removing lock. Exit: ${ETXT[ECODE]}($ECODE)" >&2
|
|
|
|
rm -rf "${LOCKDIR}"' 0
|
2024-10-08 06:00:17 +02:00
|
|
|
echo "$$" >"${PIDFILE}"
|
2023-07-05 11:10:03 +02:00
|
|
|
# the following handler will exit the script upon receiving these signals
|
|
|
|
# the trap on "0" (EXIT) from above will be triggered by this trap's "exit" command!
|
|
|
|
trap 'echo "[statsgen] Killed by a signal." >&2
|
|
|
|
exit ${ENO_RECVSIG}' 1 2 3 15
|
|
|
|
echo "success, installed signal handlers"
|
|
|
|
|
|
|
|
else
|
|
|
|
|
|
|
|
# lock failed, check if the other PID is alive
|
|
|
|
OTHERPID="$(cat "${PIDFILE}")"
|
|
|
|
|
|
|
|
# if cat isn't able to read the file, another instance is probably
|
|
|
|
# about to remove the lock -- exit, we're *still* locked
|
|
|
|
# Thanks to Grzegorz Wierzowiecki for pointing out this race condition on
|
|
|
|
# http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash
|
|
|
|
if [ $? != 0 ]; then
|
|
|
|
echo "lock failed, PID ${OTHERPID} is active" >&2
|
|
|
|
exit ${ENO_LOCKFAIL}
|
|
|
|
fi
|
|
|
|
|
|
|
|
if ! kill -0 $OTHERPID &>/dev/null; then
|
|
|
|
# lock is stale, remove it and restart
|
|
|
|
echo "removing stale lock of nonexistant PID ${OTHERPID}" >&2
|
|
|
|
rm -rf "${LOCKDIR}"
|
|
|
|
echo "[statsgen] restarting myself" >&2
|
|
|
|
exec "$0" "$@"
|
|
|
|
else
|
|
|
|
# lock is valid and OTHERPID is active - exit, we're locked!
|
|
|
|
echo "lock failed, PID ${OTHERPID} is active" >&2
|
|
|
|
exit ${ENO_LOCKFAIL}
|
|
|
|
fi
|
|
|
|
|
|
|
|
fi
|
|
|
|
```
|
|
|
|
|
|
|
|
## Related links
|
|
|
|
|
|
|
|
- [BashFAQ/045](http://mywiki.wooledge.org/BashFAQ/045)
|
|
|
|
- [Implementation of a shell locking
|
|
|
|
utility](http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash)
|
|
|
|
- [Wikipedia article on File
|
|
|
|
Locking](http://en.wikipedia.org/wiki/File_locking), including a
|
|
|
|
discussion of potential
|
|
|
|
[problems](http://en.wikipedia.org/wiki/File_locking#Problems) with
|
|
|
|
flock and certain versions of NFS.
|