Squeeze Your Data - A New Compression Strategy

by rm

Computer hard drives have certainly grown in size tremendously in the last few years. I mean, I now have a 500 GB external hard drive that I use for backing up my data. I remember how excited we were at the office when we received a computer that could hold 1 GB of data! Of course, with all the room we now have available, we tend to save a lot more stuff than we used to. However, I still don't like to waste room on my disk.

One way that is frequently used to recover some disk space is to compress the data using some form of compression software. There are several different ways to do this, each of them with its own advantages and disadvantages. Lets say, for example that you have, as I do, a large directory tree that you want to compress containing data that you hardly ever use, but that you want to have easy access to from time to time. In my case, that directory tree contains the RAW image files that come from my DSLR camera. Each of those files is about 10 MB. The total size of that directory tree is about 45 GB, and it is constantly growing.

Note: I store my finished, "processed", images on a different directory tree. They are stored as JPEG files, so they are already compressed.

How would you go about using compression to retake some disk space on a situation like this one?

There are some tools that would allow me to compress the whole directory into one huge compressed file, but that is not practical for several reasons. For example, adding new directories tio the archive is slow and complicated. What if I wanted to uncompress all the RAW files from a certain shooting session to apply a new post processing technique I just learned? Again, doing so would be slow and complicated. And lets not say anything about the memory requirements for such operations. Even backing up such a huge archive can present some challenges since certain file systems can't deal with files that large. So, is there another way?

Well, one way I came up with was to write my own tool to do this job. I created a program called 7sqz (7Squeeze) that can take care of this task with ease. It is a Python script that navigates through a directory tree compressing its contents only, not the actual directories. As it enters each directory on the tree it saves all the files on that directory on an archive on that same directory giving it the name of the directory itself. If it finds that the directory already has an archive file with the correct name it leaves it alone and goes to the next directory, unless it also finds an uncompressed file in it. When that happens it simply moves it into the existing archive file, updating it inside the archive if it was already there.

I also created 7usqz which is the opposite counterpart of 7sqz. It will simply go through a specified directory tree looking for archive files named as the holding directory and will uncompress them, essentially leaving the directory as it was before being squeezed. Both 7sqz and 7usqz use p7zip for the actual compression, so you need to have p7zip already installed.

You can obtain 7sqz from here:
http://rmcorrespond.googlepages.com/7sqz

And you can get 7usqz from here:
http://rmcorrespond.googlepages.com/7usqz

After downloading them, save them in a place like /usr/bin and make sure they are executable. To make them executable you will need to right click on each file, choose Properties from the drop down menu, then check the box next to "is Executable." Click OK and you are now ready to use them.

To use 7sqz you will need to open a terminal:

"Main Menu -> System -> Terminals -> Konsole"

Next, you could just give it a target directory as a parameter, like this:

7sqz /home/some_directory

By default it will use the 7z format (which gives better compression than zip), but you can use the zip format if you prefer by using the -m option like this:

7sqz -m zip /home/some_directory

By default it ill use Normal as the level of compression, but you can use EXTRA or MAX if you prefer by using the -l option like this:

7sqz -l Extra /home/some_directory

By default it will just skip any file if it found an error during compression and will log the error, but you can tell it to "Halt on Error" with the -e option like this:

7sqz -e /home/some_directory

And of course, you can combine options as you please like this:

7sqz -m zip -l Max -e /home/some_directory

As I said, 7usqz is the opposite counterpart of 7sqz. To use it you could just give it a target directory as a parameter, like this:

7usqz /home/some_directory

By default it will just skip any file if it found an error during decompression and will log the error, but you can tell it to "Halt on Error" with the -e option like this:

7usqz -e /home/some_directory

Please do a few, or better yet a lot of tests, before using it on a directory that you cannot afford to loose. I believe it has all the necessary safety precautions to protect your data, but I can't guaranty it. All I can say is that I have never lost any data with it and that it works great for me. But of course, let me know if you have any trouble with it. I know it is hard to really grasp what it does from this description. But, I think that if you give it a try you will see that squeezing you data is a great way to retake some disk space and to save quite a bit of time doing so.

Top