W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 1998

RE: Re: Adobe And TRACE Launch Enhanced PDF Access Via Email

From: T. V. Raman <raman@Adobe.COM>
Date: Wed, 2 Sep 1998 10:32:28 -0700 (PDT)
Message-ID: <13805.33068.443701.72176@labrador>
To: Robert Neff <rcn@fenix2.dol-esa.gov>
Cc: "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>, raman@Adobe.COM
Robert Neff writes:
In general PDF forms convert okay to the extent that you get
the content in the conversion--
but the form fields are not made active in the HTML.
The biggest nightmare scenario is one where a paper form is
scanned in to create an electronic version where portions
are subsequently made "active form fields".
If the creator does not take the effort to ensure that the
piece of paper is being OCR-ed correctly, there is little
one can do later to make the form accessible.

>From the various error reports we have received in over the
last 18 months, the majority of PDF documents that convert
poorly on some Federal WWW sites appear to have originated
from an OCR-based workflow-- and things vary all over the
map --from PDF files where in fact no OCR was done i.e. the
page just has a scanned image -- to PDF files that were
probably produced from scanned images that were not
necessarily very clean-- in these cases you either see
nothing in the output of the convertor (the convertor *does*
*not* do OCR) or fairly poor quality text --the text is of
course a direct function of the OCR results.  

These kinds of problems are particularly hard to explain to
someone who is making a check-off decision on "is it
in general this is a problem that the accessibility field
faces all over the map.

For a talk I put together on how to design accessible WWW
publishing solutions using all of today's technologies
including HTML, CSS and PDF
(given the constraints of time, cost and user benefits)
see http://cs.cornell.edu/home/raman/publications/talks/gsa-98/

The main point is to recognize what you point out below --
not all PDF files are created equal.
PDF documents get created today via a number of creation
paths --ranging from drawing applications to word-processors
to  document scanners --the mileage from the access
conversion varies  accordingly.

On the one hand it's frustrating to run into a PDF document
on the WWW that converts poorly and is consequently
but it is important to examine how those  documents
originated --rather than simply blaming it on the file
If we walked up to the average Webmaster and said "dont
published those scanned PDFs --but HTML  is accessible to the
blind" --the average Webmaster would simply create a bunch
of scanned gifs and hook them into an HTML page.

On the other hand,  documents created from word processing
applications do perform reliably when processed by the
access convertor --the amount of usage the service gets bears witness to this fact.

Finally, the purpose of the access service is not as some on
this list have alleged a "attempt to pay lip service to
accessibility" --it's a service that makes information
accessible that would otherwise remain out of reach.

For those with long memories, I originally raised the
question of accessibility to final form content such as PDF
in 1993-- at the time I was laughed off of several blindness
related lists by blind users who made statements of the form
"We called wordperfect and they said they would support it
-- so it will be accessible to us" --and other similar
foolish assertions.

I came to Adobe in 1995 and worked on the access problem
because I felt that even if I achieved 30% of what I
achieved in the world of LaTeX documents (see
http://cs.cornell.edu/home/raman for the work on audio
formatting of structured documents) with PDF documents, I
would have a greater impact on the amount of information
that is accessible --simply because I would be making a 30%
improvement to 80% of the world's documents --as opposed to
a 90% improvement to about 1% of the world's documents.  I
still believe in this decision --otherwise I would not be
doing what I am currently doing.

At the time I started working on better access to PDF,
 I had also gotten completely disillusioned with
the visual presentation slope down which HTML was sliding
--look at the mess on the WWW today and you will see the
realization of the nightmare I foresaw.

All is not gloom and doom though -- the advent of XML and
 CSS and ACSS  should help us recover some of the ground we lost. 

 > This is a reply to Kelly:
 > I agree.  Not all files are seamlessly converted.  You need to be aware 
 > what you are converting.  If it is a document with text, then it should be 
 > ok.  However if it is a form, it may not convert well.  You should test 
 > before you post!
 > Does anyone know of PDF Form (where  you can enter data)?  Would like to 
 > see how these convert.
 > -----Original Message-----
 > From:	Kelly Ford [SMTP:kford@teleport.com]
 > Sent:	Tuesday, September 01, 1998 7:57 PM
 > To:	w3c-wai-ig@w3.org
 > Subject:	RE: Re: Adobe And TRACE Launch Enhanced PDF Access Via Email
 > I don't want to start a debate here but my personal opinion is that all
 > thse converters do as much harm as good.  If you ask me pdf is a
 > problematic file format at best for people that are blind.  I recently
 > tried to convert a batch of IRS documents and the results were disasterous.
 >  Yet the person at the IRS knew all about these converters and pointed me
 > directly to them and seemed oh so very pleased that the documents would be
 > made accessible by this convert technology.

Best Regards,

      Adobe Systems                 Tel: 1 (408) 536 3945   (W14-612)
      Advanced Technology Group     Fax: 1 (408) 537 4042 
      (W14 129) 345 Park Avenue     Email: raman@adobe.com 
      San Jose , CA 95110 -2704     Email:  raman@cs.cornell.edu
      http://labrador.corp.adobe.com/~raman/        (Adobe Intranet)
      http://cs.cornell.edu/home/raman/raman.html    (Cornell)
    Disclaimer: The opinions expressed are my own and in no way should be taken
as representative of my employer, Adobe Systems Inc.
Received on Wednesday, 2 September 1998 13:31:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 October 2015 16:21:02 UTC