How to Gzip 100GB+ Files Faster with High Compression

Gzip is one of the most popular zip compression tools for Linux. Gzip is comes preinstalled with all major Linux distribution. We can easily compress and decompress files using gzip.

There is one limitation in gzip is that you can only compress a single file. If you try to compress the directory, it is ignored. Beside limitation, it is doing a single job and doing it well.

Gzip libraries are commonly used to allow applications to directly reading compressed files without extracting them, that’s why we don’t need to decompress sequencing output when feeding them to popular application.

Compressing large files

Compressing a file is easier with gzip, you can compress 100GB+ single file using gzip with below command.

$ gzip example.txt

But gzip use only one processor core due to which it cannot split the job because of that compression time increase on large file size. There is now a more gzip alternative that works on multi-core and able to split the job.

MiGz or Pigz is one of the most popular compression tools after gzip, which can split the job into multi-core to increase compression time.

Now we explain how to compress a large file with Pigz. Pigz is a multi-thread version of gzip. It is available almost on all major Linux distribution repository.

Installation

To install pigz, you need to have a user account with sudo privileges with below command you can easily install pigz in your Linux distribution.

For Debian,and Ubuntu users

$ sudo apt install pigz                        

For RPM/CentOS users

$ sudo yum install pigz                    

You can also install pigz with userspace repository, like Miniconda.

$ conda install -y pigz

After installing, we can compress any single large file using pigz. Compressing example.txt large text file with 8 cores of processing.

$ pigz example.txt

The above command creates a compressed file with the name example.txt.gz to decompress the compressed file using the below command.

$ sudo pigz -d example.txt.gz

Read this :- How to Extract or Unzip Tar Gz File in Linux

Conclusion

Terminal support multi-thread processing. This means we can parallelise compression into multiple cores without any dedicated program. But compressing large file some time can be a bottleneck.

That’s all to Gzip 100GB+ Files Faster with High Compression. If you are stuck somewhere, please feel free to comment down and If you like the article, or somewhere I missed something, please let me know to make this article more amazing.

Leave a Reply