CIT050 Index > tar

tar

The tar command stands for tape archive. It was originally intended for creating backup tapes on a magnetic tape drive.

Creating a tar file

Let’s look at some text files which we would like to tar:

[linux@localhost ~]$ ls -l *.txt
-rw-r--r-- 1 linux linux 17632 Jul  1 10:34 amendments.txt
-rw-r--r-- 1 linux linux  4261 Jul  1 10:35 bill_of_rights.txt
-rw-r--r-- 1 linux linux 27217 Jul  1 10:34 constitution.txt
-rw-r--r-- 1 linux linux  9241 Jul  1 10:31 declaration.txt

You give the tar command these options:

These options are followed by the name of the output file, and then the names of the files you want to put in to the tar file:

[linux@localhost ~]$ tar -c -v -f documents.tar *.txt
amendments.txt
bill_of_rights.txt
constitution.txt
declaration.txt

Note: as with most Linux commands, you could combine the options to tar -cvf documents.tar *.txt

Let’s take a look at the resulting file:

[linux@localhost ~]$ ls -l documents.tar
-rw-r--r-- 1 linux linux 71680 Jul  1 10:41 documents.tar

If you add up the sizes of all the individual files, you will see that they add up to only 58,351 bytes. The documents.tar file is larger than that because tar has to add extra information about the size of the file, creation date, and so forth, so that it can extract the files later. As you can see, this overhead is significant when combining a few small files.

tar also lets you compress the resulting file with either compress, gzip, or bzip2. You could do this with several commands or with a pipe, but it is easy to do by adding an option.

To use.. Use option File name ends with
compress -Z .tar.Z
gzip -z .tar.gz
.tgz
bzip2 -j .tar.bz2

Here is the result of using the three forms of compression, and the resulting file sizes. To avoid reptitious output, we did not use the -v option.

[linux@localhost ~]$ tar -cZf documents.tar.Z *.txt
[linux@localhost ~]$ tar -czf documents.tar.gz *.txt
[linux@localhost ~]$ tar -cjf documents.tar.bz2 *.txt
[linux@localhost ~]$ ls -l documents.*
-rw-r--r-- 1 linux linux 71680 Jul  1 10:41 documents.tar
-rw-r--r-- 1 linux linux 15753 Jul  1 11:02 documents.tar.bz2
-rw-r--r-- 1 linux linux 18420 Jul  1 11:01 documents.tar.gz
-rw-r--r-- 1 linux linux 24685 Jul  1 11:01 documents.tar.Z

Inspecting a tar file

Use the -t option to see what is inside a tar file without having to extract the files:

[linux@localhost ~]$ tar -tvzf documents.tar.gz
-rw-r--r-- linux/linux   17632 2008-07-01 10:34 amendments.txt
-rw-r--r-- linux/linux    4261 2008-07-01 10:35 bill_of_rights.txt
-rw-r--r-- linux/linux   27217 2008-07-01 10:34 constitution.txt
-rw-r--r-- linux/linux    9241 2008-07-01 10:31 declaration.txt

Extracting Files from a tar file

Use the -x option to extract the files. Beware: tar will overwrite existing files without asking you.

Let’s take another look at documents.tar.gz; the .gz at the end tells you that it was compressed using gzip. To unpack the files, you would use this command:

[linux@localhost ~]$ tar -xvzf document.tar.gz

The options are: x (extract), v (verbose; give me lots of output), z (use gunzip), f (a file name follows).

What if someone sent you a file named reports.tar.bz2? The bz2 would tell you that the file had been compressed using bzip2, so you would have to extract the files with a command like this:

[linux@localhost ~]$ tar -xvjf reports.tar.bz2

Advanced tar: Keeping Files

You can use the -k option to keep existing files when extracting. If tar sees a file that already exists, it will not overwrite it. Here is what happens if you use that option:

[linux@localhost ~]$ tar -xvkzf documents.tar.gz
amendments.txt
tar: amendments.txt: Cannot open: File exists
tar: Skipping to next header
bill_of_rights.txt
tar: bill_of_rights.txt: Cannot open: File exists
tar: Skipping to next header
etc

The -k option is “all or nothing.” Let’s say you have this scenario:

You make a tar of files a.txt, b.txt, and c.text:

tar -cvzf collection.tgz a.txt b.txt c.txt

You add some new material to b.text, and then later decide you want to un-tar collection.tgz. If you don’t use any option, you will overwrite the newer b.txt; if you use -k, none of the files will be overwritten. The solution to this problem is the --keep-newer-files option. It will overwrite a.txt and c.txt, but it will not overwrite b.txt, because it is newer than the one in the collections.tgz tar file. --keep-newer-files option. This option will overwrite existing files unless they are newer than the ones in the tar file.