W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 1998

Re: Adobe And TRACE Launch Enhanced PDF Access Via Email

From: Mencap in N.I. <mencap-ni@dnet.co.uk>
Date: Thu, 3 Sep 1998 16:39:46 +0100
To: <w3c-wai-ig@w3.org>
Message-ID: <01bdd751$143e6c40$5a022ec2@dna1.dnet.co.uk>
What has all these endless messages on PDF's got to do with HTML guidelines.
It really is getting a bit much
Eamonn Quinn
-----Original Message-----
From: Bruce Bailey <bbailey@clark.net>
To: w3c-wai-ig@w3.org <w3c-wai-ig@w3.org>
Cc: Paul Stauffer 301-827-5694 FAX 301-443-6385 <STAUFFERP@cder.fda.gov>;
Robert Neff <rcn@fenix2.dol-esa.gov>; T. V. Raman <raman@Adobe.COM>
Date: 03 September 1998 16:31
Subject: Re: Adobe And TRACE Launch Enhanced PDF Access Via Email


>Much thanks to Robert Neff and Paul Stauffer.  I'll share the good stuff
>(links) first and then proceed with my pontificating!
>
>Skeptic that I am, Paul has pointed me to a mainstream public site where
PDFs
>are, in my humble opinion, good and appropriate.  Some examples (and what
>happens with the Access Adobe translation):
> http://www.fda.gov/cder/guidance/1326fnl.pdf :  Started life as a word
>processing document and is pretty small compared to the other samples
>referenced.  The text comes through perfectly.  Most formatting is lost,
>although bold and italics are retained.  The graphic and emphasis from
title
>page are lost.
> http://www.fda.gov/cder/guidance/1716dft.pdf :  Also originally a word
>processing document, but this one contains formulas written using built-in
>tools (as opposed to being pasted in as a graphic).  These equations are
>handled in a consistent fashion, but too much information is lost to be
usable.
>
> http://www.fda.gov/cder/foi/nda/97/020184ap.pdf : Is a good example of a
>composite document; it includes text of varying quality, graphical
formulas,
>and attachments of typed in forms.  The whole thing has been run through a
>decent OCR process, but with predictable results.  I would guess it is
about
>95% accurate.  (Those who use OCR daily will tell you that this level of
>accuracy is not acceptable.)  The formulas are total gibberish and
misspelled
>words and artifacts abound.
> http://www.fda.gov/cder/guidance/old098fn.pdf : Is an older document, and
has
>not been run through OCR.  The translation shows only the page numbers!  No
>warning messages of any kind are generated.
>
>It is interesting to note that all of the above look similar in an Acrobat
>Reader window!  The casual browser would have no way of knowing which is an
>image and which has text.  A word search returns only "not found", even
when
>there is no text that can be scanned!
>
>It was interesting to me that Access Adobe return translations to me faster
>than Acrobat Reader.  This is not too surprising given that both Adobe and
the
>FDA have T3 connections to the internet and I am using dial-up!  Were the
>translations better, no doubt the service would be totally over whelmed by
home
>users.  I postulate that it is in Adobe's interest to keep this free
service
>mediocre!  The curious might wish to try pasting the above links through
their
>simple form (http://access.adobe.com/simple_form.html).
>
>The Food and Drug Administration basically has to choose between providing
>copious PDFs or trickling out HTML.  The documents are coming from a
variety of
>sources (including paper).  This is the kind of devil-in-the-detail choice
>disability rights advocates are regularly faced with.  Do we wage the
>impossible war against the system (and thereby be true to our ideals, but
in
>the meantime accomplish little) or work from within the system to effect
change
>(and in the meantime feel compromised, and probably give up the chance for
>radical improvements).  Given the current (less than acceptable) state of
the
>art with regard to PDF access, which (less than ideal) goal do we pursue:
>1)  Purge PDF from the web with the same vigor we fight missing ALT text.
This
>would include removing Access Adobe, since it gives the mistaken impression
>that there are easy work-arounds to dealing with PDF.  Never mind the
>mainstream opposition we will face, nor our own brethren we will anger when
>what poor tools there are go away.  Organizations like the FDA will either
>break or be given more money to do this aspect of their job properly.
>2)  Accept the status quo.  We should be grateful for what tools are handed
us
>and we can plead for help on a case-by-case basis when they don't work.  In
the
>meantime, we can educate, much like we do with the majority of WAI issues.
>
>Hopefully, work on PDF translation will continue.  I would guess that the
kind
>of optical character recognition that is needed involves the same kind of
>artificial intelligence that is needed for understanding language and real
>voice recognition.
>
>At a minimum the Acrobat Reader (and Access Adobe) should warn the user
when
>there is no text associated with the images displayed.  I would like to
Access
>Adobe offering free state of the art OCR for PDF documents.
>
>Bruce
>bbailey@clark.net
>
Received on Thursday, 3 September 1998 11:31:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:13:40 GMT