- From: Mencap in N.I. <mencap-ni@dnet.co.uk>
- Date: Thu, 3 Sep 1998 16:39:46 +0100
- To: <w3c-wai-ig@w3.org>
What has all these endless messages on PDF's got to do with HTML guidelines. It really is getting a bit much Eamonn Quinn -----Original Message----- From: Bruce Bailey <bbailey@clark.net> To: w3c-wai-ig@w3.org <w3c-wai-ig@w3.org> Cc: Paul Stauffer 301-827-5694 FAX 301-443-6385 <STAUFFERP@cder.fda.gov>; Robert Neff <rcn@fenix2.dol-esa.gov>; T. V. Raman <raman@Adobe.COM> Date: 03 September 1998 16:31 Subject: Re: Adobe And TRACE Launch Enhanced PDF Access Via Email >Much thanks to Robert Neff and Paul Stauffer. I'll share the good stuff >(links) first and then proceed with my pontificating! > >Skeptic that I am, Paul has pointed me to a mainstream public site where PDFs >are, in my humble opinion, good and appropriate. Some examples (and what >happens with the Access Adobe translation): > http://www.fda.gov/cder/guidance/1326fnl.pdf : Started life as a word >processing document and is pretty small compared to the other samples >referenced. The text comes through perfectly. Most formatting is lost, >although bold and italics are retained. The graphic and emphasis from title >page are lost. > http://www.fda.gov/cder/guidance/1716dft.pdf : Also originally a word >processing document, but this one contains formulas written using built-in >tools (as opposed to being pasted in as a graphic). These equations are >handled in a consistent fashion, but too much information is lost to be usable. > > http://www.fda.gov/cder/foi/nda/97/020184ap.pdf : Is a good example of a >composite document; it includes text of varying quality, graphical formulas, >and attachments of typed in forms. The whole thing has been run through a >decent OCR process, but with predictable results. I would guess it is about >95% accurate. (Those who use OCR daily will tell you that this level of >accuracy is not acceptable.) The formulas are total gibberish and misspelled >words and artifacts abound. > http://www.fda.gov/cder/guidance/old098fn.pdf : Is an older document, and has >not been run through OCR. The translation shows only the page numbers! No >warning messages of any kind are generated. > >It is interesting to note that all of the above look similar in an Acrobat >Reader window! The casual browser would have no way of knowing which is an >image and which has text. A word search returns only "not found", even when >there is no text that can be scanned! > >It was interesting to me that Access Adobe return translations to me faster >than Acrobat Reader. This is not too surprising given that both Adobe and the >FDA have T3 connections to the internet and I am using dial-up! Were the >translations better, no doubt the service would be totally over whelmed by home >users. I postulate that it is in Adobe's interest to keep this free service >mediocre! The curious might wish to try pasting the above links through their >simple form (http://access.adobe.com/simple_form.html). > >The Food and Drug Administration basically has to choose between providing >copious PDFs or trickling out HTML. The documents are coming from a variety of >sources (including paper). This is the kind of devil-in-the-detail choice >disability rights advocates are regularly faced with. Do we wage the >impossible war against the system (and thereby be true to our ideals, but in >the meantime accomplish little) or work from within the system to effect change >(and in the meantime feel compromised, and probably give up the chance for >radical improvements). Given the current (less than acceptable) state of the >art with regard to PDF access, which (less than ideal) goal do we pursue: >1) Purge PDF from the web with the same vigor we fight missing ALT text. This >would include removing Access Adobe, since it gives the mistaken impression >that there are easy work-arounds to dealing with PDF. Never mind the >mainstream opposition we will face, nor our own brethren we will anger when >what poor tools there are go away. Organizations like the FDA will either >break or be given more money to do this aspect of their job properly. >2) Accept the status quo. We should be grateful for what tools are handed us >and we can plead for help on a case-by-case basis when they don't work. In the >meantime, we can educate, much like we do with the majority of WAI issues. > >Hopefully, work on PDF translation will continue. I would guess that the kind >of optical character recognition that is needed involves the same kind of >artificial intelligence that is needed for understanding language and real >voice recognition. > >At a minimum the Acrobat Reader (and Access Adobe) should warn the user when >there is no text associated with the images displayed. I would like to Access >Adobe offering free state of the art OCR for PDF documents. > >Bruce >bbailey@clark.net >
Received on Thursday, 3 September 1998 11:31:44 UTC