Convert PDF to Images From the Linux Command Line

Shutterstock / Tetiana Yurchenko

Converting a PDF file to an image can be done easily on the Linux command line using a single command. Find out how to install the utility, how to use it, and how to automate your setup.

What is poppler-utils?

As mentioned in the introduction of this article, we need to install a small set of utilities named poppler-utils to help us convert PDF files to images.

The poppler-utils utility set allows us to convert images to PDF and PDF to images.

Installing poppler-utils

installing poppler-utils

To install poppler-utils on your Debian / Apt based Linux distribution (like Ubuntu and Mint), do:

sudo apt install poppler-utils

To install poppler-utils on your RedHat / Yum based Linux distribution (like RedHat and Fedora), follow these steps:

sudo yum install poppler-utils

Convert PDF to Images

The required command is simple and straightforward:

test pdftoppm -png test.pdf

With the pdftoppm command, we can convert PDFs to images. We specify that we want a PNG file for the output format (using -png) and that our input file is test.pdf.

The output file that we specify as a test. pdftoppm will automatically add a page number suffix (like -1) and an extension (based on the previous -png option passed).

The name of the output file will therefore be test-1.png, as we can verify next:

ls test-1.png
eog test-1.png

pdftoppm converting pdf file to image on Linux command line

All subsequent pages will be test-2.png etc. The eog command (if eog is installed) will open the file for you so you can review the output, although you can use any other image processing program you like.

Batch processing of PDF files to images

We can create a single line command to batch process all the PDF files with a given name to the images. We could then just add this line to a small .sh script file and automate it further, or we can just use it from the command line whenever we need to convert a large amount of PDF files to images.

ls –color = never test * .pdf | sed ‘s | .pdf || ‘ | xargs -I {} pdftoppm {} .pdf -png {}

In this command, we first get a directory listing for all the PDF files whose name starts with test and ends with .pdf, using ls –color = never test * .pdf.

The –color = never is important, because the shell color coding symbols (if they are active, as they are by default) can sometimes confuse xargs.

Next, we use a simple sed replace command to replace a literal period followed by pdf with nothing. In other words, we are removing the .pdf file extension.

This gives us the advantage of adding it back later only when necessary, i.e. when specifying the input file for pdftoppm, but not when specifying the output file for the same pdftoppm command. , just like our previous example above.

Finally, we use xargs to send each pdf filename (minus the .pdf) to pdftoppm one by one. We use the -I option for xargs which allows us to specify any input received (i.e. short pdf filenames) just using {} in the command that follows.

As you can see, our pdftoppm command now looks a lot like the first example, with each individual pdf filename input (re-suffixed with .pdf), and output the pdf filename without .pdf.

Let’s run it:

Pdftoppm automatin converting multiple PDF files to images

It worked well: the three PDF files, all with one page each, were converted to three individual .png files (one image per page and in this case per PDF as each PDF only had one page), all correctly named and suffixed correctly.

As an alternative to the -png option, one can also use -jpeg to generate JPEG files instead. Use pdftoppm –help or man pdftoppm to see a full list of options.

Wrap

In this article, we have seen how simple and straightforward it can be to convert PDF files to image files, right from the Linux command line! We’re also looking at an easy way to automate this process. Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.