How to Compress and Decompress Files Using tar in Linux

Shutterstock / iunewind

Tar is more than just an archiving utility: tar comes with some great built-in features, which allow you to compress and decompress files, while archiving them. Learn more in this article and more!

What is tar and how do I install it?

According to the tar manual (which you can access by typing man tar once it’s installed), tar is an archiving utility. It supports many features including compressing and decompressing files on the fly while checking them in. Let’s start by installing tar:

To install tar on your Debian / Apt based Linux distribution (like Ubuntu and Mint), run the following command in your terminal:

sudo apt install tar

To install tar on your RedHat / Yum based Linux distribution (like RHEL, Centos and Fedora), run the following command in your terminal:

sudo yum install tar

Next, we’ll create some sample data:

mkdir test; cd test
touch a b c d e f
echo 1> a; echo 5> e; echo ‘22222222222222222222’> b

Configuring the sample data to compress

Here we have created a directory test and created six empty files using the touch control. We have also added numbers to files a, e, and b, although especially file b contains repetitive data, which will compress well.

If you want to know more about how compression works, you can check out our How does file compression work? article.

Creating an uncompressed archive

Creating a simple uncompressed tar archive

tar -hcf all_files.tar *
ls -l | grep -v total | awk ‘{print $ 5 ” tbytes for:” $ 9}’ | sort -n

Here we have created an uncompressed archive using the tar -hcf all_files.tar * command. Let’s take a look at the options used in this command.

First, we have -h which, while not required in this particular case, I strongly recommend that you always include it in your tar commands. This option means dereference, which will dereference (or follow) symbolic links, check in and flush the files they point to.

Then we have the -c and -f options. Note that they are just written with the – in -h, that is, instead of specifying a different one, we just mark them on the other shortcut options. Quick and easy.

The -c option means to create a new archive. Note that by default, directories are archived recursively, unless a –no-recursion option is also used. The -f option allows us to specify the name of the archive. So it should come last in our option chain (because it requires an option) so that we can add the archive file name directly behind it. Using tar -fch test.tar * will not work:

Shorthand options that require an option cannot be placed in the front

Once the tar is generated, we use a modified ls output which clearly shows us the number of bytes per file. As you can see, the tar file is much larger than all of our files combined. The files are simply checked in and a global tar overload is added.

As an interesting note, we can also see which file types were working by just using the file command at the command prompt:

c file
file b
all_files.tar file

Using file to see file type

Creating an uncompressed archive

A very common compression algorithm is GZIP. Let’s add the option for the same (-z) to our short command line option string and see how that affects the file size:

tar -zhcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk ‘{print $ 5 ” tbytes for:” $ 9}’ | sort -n

Watch the size of a compressed archive compared to an uncompressed archive

This time, we’ve specified a regular expression to only use files with names a through f, preventing the tar command from including the all_files.tar file in the new all_files.tar.gz file!

See How do you actually use Regex? and Edit text using regular expressions using sed if you want to learn more about regular expressions.

We have also included the -z option which will use GZIP compression to compress the resulting .tar file after the data dump is complete. It’s great to see that we end up with a 186 byte file, which tells us that – in this case – the around 10KB tar header / overhead can be compressed very well.

The total archive size is 7.44 times larger than the total file size, but that doesn’t matter as this fictitious example is not representative of compressing large files where gains instead Losses are almost always visible, unless the data has been precompressed or is of such a format that it cannot be condensed easily using a variety of algorithms. Yet one algorithm (like GZIP’s) may be better than another (like BZIP2 for example), and vice versa, for different datasets.

Gain more bytes using high level compression

Can we make the file even smaller? Yes. We can set the maximum compression option of GZIP using the -I option on tar which allows us to specify a compression program to use (with thanks to stackoverflow user ideasman42):

tar -I ‘gzip -9’ -hcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk ‘{print $ 5 ” tbytes for:” $ 9}’ | sort -n

Using the -I option for tar to specify a compression program

Here we have specified -I ‘gzip -9’ as the compression program to use, and we removed the -z option (because we are now specifying a specific custom program to use instead of using the built-in GZIP tar configuration) . The result is that we are 12 bytes less thanks to a better (but generally slower) compression attempt (at level -9) by GZIP.

In general, the faster the compression (lower level of compression attempts, i.e. -1), the larger the file size. And, the slower the compression (higher level of compression attempts, i.e. -9), the smaller the file. You can set your own preferences by varying the compression level from -1 (fast) to -9 (slow)

Other compression programs

There are two other common compression algorithms that one can explore and test (different algorithm options also give different sizing results and may have additional compression options), and this is bzip2, which can be used in specifying the -j option to tar, and XZ which can be used by specifying the -J option.

You can also use the -I command to set the maximum compression options for bzip2 (-9):

Example of a bzip -9 compression program

And -9e for xz:

xz -9e compression program example

As you can see, the results are poorer in this case than using the somewhat standard GZIP algorithm. Still, the bzip2 and xz algorithms may show improvements with other datasets.

Unzip a file

Unzipping a file is super easy, regardless of the original method of compressing it, and provided that such a compression algorithm is present on your computer. For example, if the original compression algorithm was bzip2 (denoted by a .bz2 extension of the tar filename) then you will want to have made sudo apt install bzip2 (or sudo yum install bzip2) on your target machine which should unzip the file.

rm a b c d e f
tar -xf all_files.tar.gz
ls

Decompression of a compressed (or uncompressed) tar archive

We simply specify -x to expand or decompress our all_files.tar.gz file, and indicate what the filename is by again using the -f shortcut option as before.

Compressing files can help you save a lot of space on your storage devices, and knowing how to use tar in combination with the available compression options will help you do that. Once the archive needs to be extracted again, it is easy to do so provided that the appropriate unzipping software is available on the computer used to unzip or extract the data from your archive. Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.