Re: Acrobat PDF & Accessibility from David Woolley on 2001-12-20 (w3c-wai-ig@w3.org from October to December 2001)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Thu, 20 Dec 2001 22:54:08 +0000 (GMT)
To: w3c-wai-ig@w3.org
Message-Id: <200112202254.fBKMs8K21354@djwhome.demon.co.uk>
> maybe I'm wrong but isn't pdf a "photograph" of the page rather than
> actual letter codeing, sort of like a fax vs teletype machine??

You are wrong.  PDF is essentially PostScript without the programmability
(not that many people realise that PostScript is a programming language,
these days - it integrates near pixel perfection and a scripting
language long before (presentational pseudo-) HTML and JavaScript!).
It does have additional navigations features, which have been enhanced
for the accessibility features.  (Instead of having graphics primitives
with variable parameters, and control flow and assignments, PDF is the
result of running all the scripting and outputting just the primitives
with the final computed numeric and string parameters.)

It's somewhat similar to static SVG, except that static SVG doesn't,
I believe, support multi-page documents in a single file, and requires
separate files for significant bitmaps.

PDF is capable of coding text in a way that is easy to recover, but is
usually used to postprocess Word documents these days.  Word plus the
Windows PostScript drivers does extensive microspacing, resulting both
in bloat of the PDF file and words being broken up and word spaces not
being physically present.  That is a fault of the original, generally 
non-Adobe authoring tools; I expect exactly the same fault in SVG
documents produced with the same tools.

PDF can be used with legacy documents to take a scanned image and make
it look, at first *sight*, as though it is the same as a document
prepared from the machine readable source.  In my experience, some
design consultancies don't understand PostScript and PDF and output
bitmaps of brochures and then code them to PDF, but that's a wetware
problem.

For real scanned images, I believe that Adobe sell a tool that will
underlay the scanned image with an OCRed version of the text, so that
you can cut and paste the text (and presumably screen read) but still
have the accurate rendition of the legacy documentation, visually.

> LYNX or at least my version of it chokes on pdf

That's because you don't have Acrobat reader or ghostscript in your
mailcap file.  That those tools may not be good for the blind is a
different issue.

Note that, by default, PDF is compressed, which is why a plain text viewer
will see gibberish, but PDF is actually, like SVG, a basically textual
format, and one can have valid PDF documents that are purely textual.

This is a fragment of PDF, prepared with groff, which doesn't go
overboard with microspacing, although some kerning does break the
flow, and with an early version of ghostscript, which doesn't compress
(later versions can be forced not to compress).  The actual text is
in the parentheses.  The kerned words are "facings" and "garages".

-300.456 -36 Td(1. Summary)Tj
/R8 10 Tf
1.318 Tw
25 -15.6 Td(Replace or repair as necessary unsound materials in the f)Tj
1.317 Tw
239.201 0 Td(acings to the g)Tj
3.817 Tw
61.9509 0 Td(arages. Replace)Tj
1.317 Tw
70.3839 0 Td(or repair)Tj
Received on Thursday, 20 December 2001 17:55:11 UTC