Removing "Apago PDF Enhancer" watermark (and my first intro to PDFs)

3 minute read 2014-10-16 Ashkan Kiani -

I have a pdf that I acquired through various means that was nearly perfect, except for the giant "APAGO PDF EHANCER" in the center of every page.

I had a suspicion that this should be an easy problem to deal with, so I set off to find a solution. I learned a few things about PDFs by doing this. I thought PDFs had a complicated format, but unless they are encrypted, they are basically plaintext. It's not impossible to understand how the PDF objects are declared. I learned a lot of this from superuser Dennis's great guide here. It's basically everything I did here with a small addition.

This is a snippet after I decompressed the PDF using pdftk to decompress it (pdftk original.pdf output uncompressed.pdf uncompress):

6 0 obj 
<<
/Parent 9 0 R
/Kids [5 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R]
/Count 8
/Type /Pages
>>
endobj 
8 0 obj 
<<
/Length 144
>>
stream
q Q q 0 0 252 319 re W n q 1 0 0 1 28.8 152.75 cm BT 18 0 0 18 0 0 Tm /TT1
1 Tf (Apago PDF Enhancer) Tj ET Q q 252 0 0 319.32 0 0 cm /Im1 Do Q Q
endstream 
endobj 

After opening the pdf with vim, I looked for Apago, and found it, but this wasn't very helpful now because I had to learn how to remove it without messing up the PDF. So I set off to learn how to manipulate PDFs (and basically learn a bit on Postscript).

The first link I found was immensely helpful as it meant I didn't have to comb through the Postscript language reference (perhaps another day). Basically, this was being declared inside a Block text area and I would be safe to delete everything between BT and ET. It's possible I could delete more, but I haven't checked, and just deleting the block text area worked fine (and reduced the PDF by 2MB too!).

I then set out to write a script that could do the same for anyone else. So, assuming you have pdftk and vim installed, [you can run it](TODO: Add link). I tried doing it with sed and perl, but it was honestly not much worse to just use vim. So until I write a better script, this works for now (and if anyone wants to do a better script, feel free to post it and I'll link to you).

test -z "$1" && {
echo "Usage: $0 <filename> <output=filename.new.pdf>"
exit 1
}

pdftk "$1" output /tmp/uncompressed.pdf uncompress

# sed -i.original -e ':a' -e 'N' -e '$!ba' -e 's/BT 18 0 0 18 0 0 Tm \/TT1\n1 Tf (Apago PDF Enhancer) Tj ET//g' uncompressed.pdf

vim +'let @r="?BT
d/ET
xx"' +'g/(Apago PDF Enhancer)/norm @r' +'wq' /tmp/uncompressed.pdf

pdftk /tmp/uncompressed.pdf output "${2:-$1.new.pdf}" compress

rm /tmp/uncompressed.pdf
edit (2019-07-09)

Haha, I can't believe I used vim for this. That's pretty genius.

Additionally, this could be used to basically remove any text based watermark, and you could just go into the script and swap out (Apago PDF Enhancer) with (TEXT I WANT TO REMOVE). The parenthesis are important.


published in dev and tagged pdf , hacking , scripting and vim