mirror of
https://github.com/rawiriblundell/wiki.bash-hackers.org
synced 2024-12-25 06:00:41 +01:00
310 lines
13 KiB
Plaintext
310 lines
13 KiB
Plaintext
====== pax - the POSIX archiver ======
|
|
|
|
{{keywords>bash shell scripting POSIX archive tar packing zip}}
|
|
|
|
pax can do a lot of fancy stuff, feel free to contribute more awesome pax tricks!
|
|
|
|
===== Introduction =====
|
|
|
|
The POSIX archiver, ''pax'', is an attempt at a standardized archiver with
|
|
the best features of ''tar'' and ''cpio'', able to handle all common archive types.
|
|
|
|
However, this is **not a manpage**, it will **not** list all possible options,
|
|
it will **not** you detailed information about ''pax''. It's only an introduction.
|
|
|
|
This article is based on the debianized Berkeley implementation of ''pax'', but implementation-specific things should be tagged as such. Unfortunately, the Debian package doesn't seem to be maintained anymore.
|
|
|
|
|
|
===== Overview =====
|
|
|
|
==== Operation modes ====
|
|
|
|
There are four basic operation modes to //list//, //read//, //write// and
|
|
//copy// archives. They're switched with combinations of ''-r'' and ''-w''
|
|
command line options:
|
|
|
|
^ Mode ^ RW-Options ^
|
|
| List | //no RW-options// |
|
|
| Read | ''-r'' |
|
|
| Write | ''-w'' |
|
|
| Copy | ''-r -w'' |
|
|
|
|
|
|
=== List ===
|
|
|
|
In //list mode//, ''pax'' writes the list of archive members to standard
|
|
output (a table of contents). If a pattern match is specified on the
|
|
command line, only matching filenames are printed.
|
|
|
|
=== Read ===
|
|
|
|
//Read// an archive. ''pax'' will read archive data and extract the members to the
|
|
current directory. If a pattern match is specified on the command line, only matching
|
|
filenames are extracted.
|
|
|
|
When reading an archive, the archive type is determined from the archive data.
|
|
|
|
=== Write ===
|
|
|
|
//Write// an archive, which means create a new one or append to an
|
|
existing one. All files and directories specified on the command line are inserted into
|
|
the archive. The archive is written to standard output by default.
|
|
|
|
If no files are specified on the command line, filenames are read from ''STDIN''.
|
|
|
|
The write mode is the only mode where you need to specify the archive type
|
|
with ''-x <TYPE>'', e.g. ''-x ustar''.
|
|
|
|
=== Copy ===
|
|
|
|
//Copy// mode is similar to ''cpio'' passthrough mode. It provides a way to replicate a complete or partial file hierarchy
|
|
(with all the ''pax'' options, e.g. rewriting groups) to another location.
|
|
|
|
==== Archive data ====
|
|
|
|
When you don't specify anything special, ''pax'' will attempt to read archive
|
|
data from standard input (read/list modes) and write archive data to
|
|
standard output (write mode). This ensures ''pax'' can be easily
|
|
used as part of a shell pipe construct, e.g. to read a compressed
|
|
archive that's decompressed in the pipe.
|
|
|
|
The option to specify the pathname of a file to be archived is ''-f''
|
|
This file will be used as input or output, depending on the operation
|
|
(read/write/list).
|
|
|
|
When pax reads an archive, it tries to guess the archive type.
|
|
However, in //write// mode, you must specify which type of archive
|
|
to append using the ''-x <TYPE>'' switch. If you omit this switch,
|
|
a default archive will be created (POSIX says it's implementation defined,
|
|
Berkeley ''pax'' creates ''ustar'' if no options are specified).
|
|
|
|
The following archive formats are supported (Berkeley implementation):
|
|
|ustar |POSIX TAR format (default)|
|
|
|cpio |POSIX CPIO format |
|
|
|tar |classic BSD TAR format |
|
|
|bcpio |old binary CPIO format |
|
|
|sv4cpio |SVR4 CPIO format |
|
|
|sv4crc |SVR4 CPIO format with CRC |
|
|
|
|
Berkeley ''pax'' supports options ''-z'' and ''-j'', similar to GNU ''tar'', to filter archive files through GZIP/BZIP2.
|
|
|
|
==== Matching archive members ====
|
|
|
|
In //read// and //list// modes, you can specify patterns to determine which files to list or extract.
|
|
|
|
* the pattern notation is the one known by a POSIX-shell, i.e. the one known by Bash without ''extglob''
|
|
* if the specified pattern matches a complete directory, it affects all files and subdirectories of the specified directory
|
|
* if you specify the ''-c'' option, ''pax'' will invert the matches, i.e. it matches all filenames **except** those matching the specified patterns
|
|
* if no patterns are given, ''pax'' will "match" (list or extract) all files from the archive
|
|
* **To avoid conflicts with shell pathname expansion, it's wise to quote patterns!**
|
|
|
|
=== Some assorted examples of patterns ===
|
|
|
|
<code>
|
|
pax -r <myarchive.tar 'data/sales/*.txt' 'data/products/*.png'
|
|
</code>
|
|
|
|
<code>
|
|
pax -r <myarchive.tar 'data/sales/year_200[135].txt'
|
|
# should be equivalent to
|
|
pax -r <myarchive.tar 'data/sales/year_2001.txt' 'data/sales/year_2003.txt' 'data/sales/year_2005.txt'
|
|
</code>
|
|
|
|
===== Using pax =====
|
|
|
|
This is a brief description of using ''pax'' as a normal archiver
|
|
system, like you would use ''tar''.
|
|
|
|
|
|
==== Creating an archive ====
|
|
|
|
This task is done with basic syntax
|
|
<code>
|
|
# archive contents to stdout
|
|
pax -w >archive.tar README.txt *.png data/
|
|
|
|
# equivalent, extract archive contents directly to a file
|
|
pax -w -x ustar -f archive.tar README.txt *.png data/
|
|
</code>
|
|
|
|
''pax'' is in //write// mode, the given filenames are packed into an
|
|
archive:
|
|
* ''README.txt'' is a normal file, it will be packed
|
|
* ''*.png'' is a pathname glob **for your shell**, the shell will substitute all matching filenames **before** ''pax'' is executed. The result is a list of filenames that will be packed like the ''README.txt'' example above
|
|
* ''data/'' is a directory. **Everything** in this directory will be packed into the archive, i.e. not just an empty directory
|
|
|
|
When you specify the ''-v'' option, ''pax'' will write the pathnames of the
|
|
files inserted into the archive to ''STDERR''.
|
|
|
|
When, and only when, no filename arguments are specified, ''pax'' attempts to
|
|
read filenames from ''STDIN'', separated by newlines.
|
|
This way you can easily combine ''find'' with ''pax'':
|
|
<code>
|
|
find . -name '*.txt' | pax -wf textfiles.tar -x ustar
|
|
</code>
|
|
|
|
==== Listing archive contents ====
|
|
|
|
The standard output format to list archive members simply is to print each
|
|
filename to a separate line. But the output format can be customized to include
|
|
permissions, timestamps, etc. with the ''-o listopt=<FORMAT>'' specification.
|
|
The syntax of the format specification is strongly derived from the
|
|
''printf(3)'' format specification.
|
|
|
|
**Unfortunately** the ''pax'' utility delivered with Debian doesn't seem to
|
|
support these extended listing formats.
|
|
|
|
However, ''pax'' lists archive members in a ''ls -l''-like format, when you
|
|
give the ''-v'' option:
|
|
<code>
|
|
pax -v <myarchive.tar
|
|
# or, of course
|
|
pax -vf myarchive.tar
|
|
</code>
|
|
|
|
==== Extracting from an archive ====
|
|
|
|
You can extract all files, or files (not) matching specific patterns from an
|
|
archive using constructs like:
|
|
<code>
|
|
# "normal" extraction
|
|
pax -rf myarchive.tar '*.txt'
|
|
</code>
|
|
|
|
<code>
|
|
# with inverted pattern
|
|
pax -rf myarchive.tar -c '*.txt'
|
|
</code>
|
|
|
|
|
|
==== Copying files ====
|
|
|
|
To copy directory contents to another directory, similar to a
|
|
''cp -a'' command, use:
|
|
<code>
|
|
mkdir destdir
|
|
pax -rw dir destdir #creates a copy of dir in destdir/, i.e. destdir/dir
|
|
</code>
|
|
|
|
|
|
|
|
|
|
==== Copying files via ssh ====
|
|
|
|
To copy directory contents to another directory on a remote system, use:
|
|
<code>
|
|
pax -w localdir | ssh user@host "cd distantdest && pax -r -v"
|
|
pax -w localdir | gzip | ssh user@host "cd distantdir && gunzip | pax -r -v" #compress the sent data
|
|
</code>
|
|
These commands create a copy of localdir in distandir (distantdir/dir) on the remote machine.
|
|
|
|
===== Advanced usage =====
|
|
|
|
|
|
==== Backup your daily work ====
|
|
|
|
__**Note:**__ ''-T'' is an extension and is not defined by POSIX.
|
|
|
|
Say you have write-access to a fileserver mounted on your filesystem tree.
|
|
In //copy// mode, you can tell ''pax'' to copy only files that were
|
|
modified today:
|
|
<code>
|
|
mkdir /n/mybackups/$(date +%A)/
|
|
pax -rw -T 0000 data/ /n/mybackups/$(date +%A)/
|
|
</code>
|
|
This is done using the ''-T'' switch, which normally allows you to specify a
|
|
time window, but in this case, only the start time which means "today at midnight".
|
|
|
|
When you execute this "very simple backup" after your daily work,
|
|
you will have a copy of the modified files.
|
|
|
|
__**Note:**__ The ''%A'' format from ''date'' expands to the name of the
|
|
current day, localized, e.g. "Friday" (en) or "Mittwoch" (de).
|
|
|
|
The same, but with an archive, can be accomplished by:
|
|
<code>
|
|
pax -w -T 0000 -f /n/mybackups/$(date +%A)
|
|
</code>
|
|
In this case, the day-name is an archive-file (you don't need a filename
|
|
extension like ''.tar'' but you can add one, if desired).
|
|
|
|
|
|
|
|
==== Changing filenames while archiving ====
|
|
|
|
''pax'' is able to rewrite filenames while archiving or while extracting from an archive. This example creates a tar archive containing the ''holiday_2007/'' directory, but the directory name inside the archive will be ''holiday_pics/'':
|
|
<code>
|
|
pax -x ustar -w -f holiday_pictures.tar -s '/^holiday_2007/holiday_pics/' holiday_2007/
|
|
</code>
|
|
|
|
The option responsible for the string manipulation is the ''-s <REWRITE-SPECIFICATION>''. It takes the string rewrite specification as an argument, in the form ''/OLD/NEW/[gp]'', which is an ''ed(1)''-like regular expression (BRE) for ''old'' and generally can be used like the popular sed construct ''s/from/to/''. Any non-null character can be used as a delimiter, so to mangle pathnames (containing slashes), you could use ''#/old/path#/new/path#''.
|
|
|
|
The optional ''g'' and ''p'' flags are used to apply substitution **(g)**lobally to the line or to **(p)**rint the original and rewritten strings to ''STDERR''.
|
|
|
|
Multiple ''-s'' options can be specified on the command line. They are applied to the pathname strings of the files or archive members. This happens in the order they are specified.
|
|
|
|
==== Excluding files from an archive ====
|
|
|
|
The -s command seen above can be used to exclude a file. The substitution must result in a null string:
|
|
For example, let's say that you want to exclude all the CVS directories to create a source code archive.
|
|
We are going to replace the names containing /CVS/ with nothing, note the .* they are needed because we need to match the entire pathname.
|
|
<code>
|
|
pax -w -x ustar -f release.tar -s',.*/CVS/.*,,' myapplication
|
|
</code>
|
|
You can use several -s options, for instance, let's say you also want to remove files ending in ~:
|
|
<code>
|
|
pax -w -x ustar -f release.tar -'s,.*/CVS/.*,,' -'s/.*~//' myapplication
|
|
</code>
|
|
|
|
This can also be done while reading an archive, for instance, suppose you have an archive containing a "usr" and a "etc" directory but that you want to extract only the "usr" directory:
|
|
<code>
|
|
pax -r -f archive.tar -s',^etc/.*,,' #the etc/ dir is not extracted
|
|
</code>
|
|
|
|
==== Getting archive filenames from STDIN ====
|
|
|
|
Like ''cpio'', pax can read filenames from standard input (''stdin''). This provides great flexibility - for example, a ''find(1)'' command may select files/directories in ways pax can't do itself. In **write** mode (creating an archive) or **copy** mode, when no filenames are given, pax expects to read filenames from standard input. For example:
|
|
|
|
<code>
|
|
# Back up config files changed less than 3 days ago
|
|
find /etc -type f -mtime -3 | pax -x ustar -w -f /backups/etc.tar
|
|
|
|
# Copy only the directories, not the files
|
|
mkdir /target
|
|
find . -type d -print | pax -r -w -d /target
|
|
|
|
# Back up anything that changed since the last backup
|
|
find . -newer /var/run/mylastbackup -print0 |
|
|
pax -0 -x ustar -w -d -f /backups/mybackup.tar
|
|
touch /var/run/mylastbackup
|
|
</code>
|
|
|
|
The ''-d'' option tells pax ''not'' to recurse into directories it reads (''cpio''-style). Without ''-d'', pax recurses into all directories (''tar''-style).
|
|
|
|
**Note**: the ''-0'' option is not standard, but is present in some implementations.
|
|
|
|
===== From tar to pax =====
|
|
|
|
''pax'' can handle the ''tar'' archive format, if you want to switch to the standard tool an alias like:
|
|
<code>
|
|
alias tar='echo USE PAX, idiot. pax is the standard archiver!; # '
|
|
</code>
|
|
in your ''~/.bashrc'' can be useful :-D.
|
|
|
|
Here is a quick table comparing (GNU) ''tar'' and ''pax'' to help you to make the switch:
|
|
^ TAR ^ PAX ^ Notes ^
|
|
| ''tar xzvf file.tar.gz'' | ''pax -rvz -f file.tar.gz'' | ''-z'' is an extension, POSIXly: ''gunzip <file.tar.gz %%|%% pax -rv'' |
|
|
| ''tar czvf archive.tar.gz path ...'' | ''pax -wvz -f archive.tar.gz path ...''| ''-z'' is an extension, POSIXly: ''pax -wv path %%|%% gzip > archive.tar.gz'' |
|
|
| ''tar xjvf file.tar.bz2'' | ''bunzip2 <file.tar.bz2 %%|%% pax -rv'' | |
|
|
| ''tar cjvf archive.tar.bz2 path ...'' | ''pax -wv path %%|%% bzip2 > archive.tar.bz2'' | |
|
|
| ''tar tzvf file.tar.gz'' | ''pax -vz -f file.tar.gz'' | ''-z'' is an extension, POSIXly: ''gunzip <file.tar.gz %%|%% pax -v'' |
|
|
|
|
''pax'' might not create ustar (''tar'') archives by default but its own pax format, add ''-x ustar'' if you want to ensure pax creates tar archives!
|
|
|
|
===== Implementations =====
|
|
|
|
* [[http://www2.research.att.com/sw/download/ | AT&T AST toolkit]] | [[http://www2.research.att.com/~gsf/man/man1/pax.html | manpage]]
|
|
* [[http://heirloom.sourceforge.net/index.html | Heirloom toolchest]] | [[http://heirloom.sourceforge.net/man/pax.1.html | manpage]]
|
|
* [[http://www.openbsd.org/cgi-bin/cvsweb/src/bin/pax/ | OpenBSD pax]] | [[http://www.openbsd.org/cgi-bin/man.cgi?query=pax&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html | manpage]]
|
|
* [[https://launchpad.net/paxmirabilis | MirBSD pax]] | [[https://www.mirbsd.org/htman/i386/man1/pax.htm | manpage]] - Debian bases their package upon this.
|