Main Contents

splitting scanned book images in a pdf

September 3, 2008

I had a pdf of a scanned book, unfortunately, each pdf page was two scanned book pages, and each page needed to be rotated too.

First, I used pdfimages to extract the scanned images, but you can probably do it with ghostscript or netpbm or one of the other tools out there.

# had to install poppler for pdfimages
sudo port install poppler
pdfimages input.pdf out # generates a series of pbm files, out-000.pbm etc

#rotate and split each page, return tiff file {i/pbm/tiff} replaces ‘pbm’ from the string with ‘tiff’
for i in out*pbm; do convert $i -rotate 90 -crop 50%x100% ${i/pbm/tiff}; done

#convert all tiffs into pdfs
for i in out*tiff; do tiff2pdf -o ${i/tiff/pdf} $i; done

#merge the pdfs
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf out*pdf

That’s the command line approach to solving that problem.

Filed under: Everything |

Leave a comment