Web Site Optimization: How to Optimize PDF Files for Web Sites


Web Site Optimization: How to Optimize PDF Files for Web Sites

 by: Andy King

Portable Document Format (PDF) is the defacto file format for presenting device-independent documents on and off the Web. While PDFs have become quite popular on the Web, many PDFs used in web sites are designed for high quality print output and are not optimized for the Web. Even PDFs designed for Web use can have a wait problem, weighed down with excess fonts, change histories, and unoptimized images and forms. Optimizing PDF files for the Web can significantly shrink their size and boost display speed, saving bandwidth and user frustration. (For the full “Optimize PDF Files for the Web” article, see http://www.websiteoptimization.com/speed/tweak/pdf/)

Creating Small PDFs

The main factors in creating small PDFs are image resolution, image type (bitmap or vector), the number of fonts used and how they are embedded, PDF version, and the level of compression. In general the higher the PDF version number, the smaller the file. Acrobat 5 (PDF version 1.4) added JBIG2 compression, which is superior to the CCITT or Zip algorithms when compressing scanned monochromatic copy.

JBIG2 (Joint Bilevel Image Experts Group) encoding compresses monochrome (1 bit per pixel) image data from 20:1 to 50:1 for pages full of text. Like other dictionary-based algorithms (LZW, ZIP) JBIG2 creates a table of unique symbols and when a subsequent symbol matches one in the table, it substitutes a token pointing to the table index. JBIG2 also compresses the entire table.

Acrobat 6 (PDF version 1.5) added the ability to compress the entire file (Clean Up Settings dialog). However, since over 90% of Acrobat users have version 5.0 or greater, using PDF 1.4 is a safer alternative. Acrobat will usually display (with a warning) a more recent PDF version, but new compression schemes will spawn an error when opened in older versions of Acrobat. At the time of this writing, Adobe says that of those 90%, 50% use version 5 and 40% use version 6.

To create the smallest possible PDFs file size for the Web minimize the number of fonts, bitmapped images, and substitute vector based-graphics instead. Minimize the number and complexity of forms in your PDF document, and avoid the use of multimedia.

There are different methods to create PDFs, including outputting to PostScript and Distilling, GDI/Printing, one-click "Direct to PDF," and dynamically on the server-side. However you create a PDF, the techniques and tools listed below can help you enhance and optimize your PDFs for the Web.

Avoid Refried Graphics

For graphics that must be inserted as bitmaps, prepare them for maximum compressibility and minimum dimensions. Use the best quality images that you can at the output resolution of the PDF. Inserting compressed JPEGs into PDFs and Distilling them may recompress JPEGs, which can create noticeable artifacts. Use black and white images and text instead of color images to allow the use of the newer JBIG2 standard that excels in monochromatic compression. Be sure to turn off thumbnails when saving PDFs for the Web.

Use Vector Graphics

Use vector-based graphics wherever possible for images that would normally be made into GIFs. Vector images scale perfectly, look marvelous, and their mathematical formulas usually take up less space than bitmapped graphics that describe every pixel (although there are some cases where bitmap graphics are actually smaller than vector graphics). You can also compress vector image data using ZIP compression, which is built into the PDF format. Acrobat Reader version 5 and 6 also support the SVG standard.

Minimize Fonts

How you use fonts, especially in smaller PDFs, can have a significant impact on file size. Minimize the number of fonts you use in your documents to minimize their impact on file size. Each additional fully embedded font can easily take 40K in file size, which is why most authors create "subsetted" fonts that only include the glyphs actually used.

Fix Fat Forms

Acrobat forms can take up a lot of space in your PDFs. You can use PDF Enhancer from Apago to reduce forms by 50% by removing information present in the file but never actually used. You can also combine a refried PDF with the old form pages to create a hybrid PDF in Acrobat.

Optimizing Existing PDFs

In many cases you won't have access to the original document, just the resulting PDF file. Many PDFs we've seen are not fully optimized for the Web, using conservative settings more appropriate to high-resolution printers. For computer monitors viewing web-based PDFs, you don't need high resolution images and exact reproduction of font faces, you just want to convey your information in an efficient way. Using the techniques outlined below, you can shrink your PDFs, while still maintaining the textual data for search engines, and reasonable quality for print output. Some webmasters offer two versions of their PDFs, once for fast web display, and one for printing.

Save As...

Once you're done making changes to your PDF document choose File -> Save As and overwrite your existing PDF file. By default, save as removes changes that are appended to PDFs by the Save command, linearizes the file for fast web viewing, and removes unused objects.

The result is a compact, linearized PDF that displays the first page (or an arbitrary page) quickly, while the rest of the file downloads in the background. Although linearized PDFs are slightly larger, they also increase perceived speed. Note that optimizing a signed document will invalidate its signature.

By Andy King

http://www.websiteoptimization.com