Re: Acrobat PDF & Accessibility from Vadim Plessky on 2001-12-21 (w3c-wai-ig@w3.org from October to December 2001)

From: Vadim Plessky <lucy-ples@mtu-net.ru>
Date: Fri, 21 Dec 2001 17:26:42 +0000
To: David Woolley <david@djwhome.demon.co.uk>, w3c-wai-ig@w3.org
Message-Id: <200112211432.fBLEWjH22384@post.cnt.ru>
On Thursday 20 December 2001 22:54, David Woolley wrote:
|   > maybe I'm wrong but isn't pdf a "photograph" of the page rather than
|   > actual letter codeing, sort of like a fax vs teletype machine??
|
|   You are wrong.  PDF is essentially PostScript without the programmability
|   (not that many people realise that PostScript is a programming language,
|   these days - it integrates near pixel perfection and a scripting
|   language long before (presentational pseudo-) HTML and JavaScript!).

yes, that's correct.
You can solve equations using PostScript, and print out results on rendered 
page. :-)
Besides, you can execute some *PostScript programs* and get no output at all 
- it will either do some internal work (recoding fonts, conversion, etc.) or 
export/import operations (good example is ps2pdf utility in GhostScript) or 
both.
 
[...]
|
|   PDF is capable of coding text in a way that is easy to recover, but is
|   usually used to postprocess Word documents these days.  Word plus the
|   Windows PostScript drivers does extensive microspacing, resulting both
|   in bloat of the PDF file and words being broken up and word spaces not
|   being physically present.  That is a fault of the original, generally
|   non-Adobe authoring tools; I expect exactly the same fault in SVG
|   documents produced with the same tools.

Can someone explain to me what do you mean by *accessible* PDF?
Is it PDF without "microspacing" and "words being broken up"?
// I apologize in advance that I don't have time to read numerous Adobe 
specs, so simple explanation in 2-3 sentences would be enough.

It seems to me that KWord-generated PDF (which uses Qt for this, plus own 
layouting engine) is quite different from what you described above.
If you send me off-list some small (<30K) .doc or .html or simple RTF file, 
and explain *how* should good PDF file produced from that doc look like, I 
wil do the testing and post results here.
 
|
|   PDF can be used with legacy documents to take a scanned image and make
|   it look, at first *sight*, as though it is the same as a document
|   prepared from the machine readable source.  In my experience, some
|   design consultancies don't understand PostScript and PDF and output

How many people understand PostScript and PDF? Not too many, IMO.
I was studing PostScript around 7-8 years ago, but since that time PLRM 
version 3.0 was published, it's 950 pages, and I just don't have time to come 
through...
Most people use auto-generated PostScript (Windows or MacOS "PS driver", some 
publishing software, Adobe tools after all) 

|   bitmaps of brochures and then code them to PDF, but that's a wetware
|   problem.
|
|   For real scanned images, I believe that Adobe sell a tool that will
|   underlay the scanned image with an OCRed version of the text, so that
|   you can cut and paste the text (and presumably screen read) but still
|   have the accurate rendition of the legacy documentation, visually.

You may want to take a look at Xerox Document Centre (DC330 or DC340, for 
example).
Fully-configured machine can not only print documents and copy them, but also 
scan-to-PC and automatically produce PDFs from those scanned pages.
IIRC text is recognized (OCR'ed) and stored as text in those PDFs.
[yes, I was working for Xerox some time ago, and although I was working in 
different division, I used this from time to time]

Don't know though how accessible those PDFs. Hope they were not bad. :-)

-- 

Vadim Plessky
http://kde2.newmail.ru  (English)
33 Window Decorations and 6 Widget Styles for KDE
http://kde2.newmail.ru/kde_themes.html
KDE mini-Themes
http://kde2.newmail.ru/themes/
Received on Friday, 21 December 2001 09:33:48 UTC