W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 2004

Re: PDF in WCAG 2

From: david poehlman <david.poehlman@handsontechnologeyes.com>
Date: Thu, 19 Aug 2004 15:38:16 -0400
Message-ID: <01f701c48624$131e6440$6401a8c0@DAVIDPC>
To: "WAI-IG" <w3c-wai-ig@w3.org>, "Joe Clark" <joeclark@joeclark.org>


You are correct but with a caviot.  these tools in their latest and greatest
encarnations and with the right operating systems and when the moon is full
will make a pretty good try at presenting pdfs to screen reader users.  None
of them however will render even carefully tagged complex pdf data in an
efficient manner.

Johnnie Apple Seed

----- Original Message ----- 
From: "Joe Clark" <joeclark@joeclark.org>
To: "WAI-IG" <w3c-wai-ig@w3.org>
Sent: Thursday, August 19, 2004 3:12 PM
Subject: Re: PDF in WCAG 2

At 21:29 -0400 2004.08.18, Access Systems wrote:
>>  Every platform in common use (including Linxux) can read PDFs,
>>including via open-source tools. Plus, I dunno, have you considered
>Linux can read pdf VISUALLY but not with text browsers,  sure in
>Mozilla or others you can see pdf's but how do you do it with a
>screen reader.

I don't know what you mean. The few text-only browsers in existence
(I use Lynx ever day) have not been upgraded to read PDFs. That's
nobody's problem but the authors of text-only browsers-- and their

There may be some kind of plug-in for Moz that lets it read PDFs
right in the browser (like Schubert-It), but I don't have one.

"How do you do it with a screen reader?" At present, you use Jaws,
Window-Eyes, or Hal. I infer, based on publicly-released information,
that Apple VoiceOver-- freely built into OS X 10.4, according to the
information-- will read PDFs.


At 09:57 -0400 2004.08.19, John Foliot - WATS.ca wrote:
>  Some members of the Working Group fail to understand that
>presentation *is* content.
>Some members of this list fail to accept that for many users,
>presentation is a structural concept,

That's an oxymoron.

>not a "pretty picture" concept.
>The basic premise of the web, and accessible web development is,
>was, and (IMHO) should always be the separation of display and
>content (unless of course the whole idea of CSS, semantic web,
>structural development, etc. is just a crock and the web really is
>about pretty pictures)

Oh, stop. CSS can generate content. Structural markup is based *on*
content: "This chunk of text is a heading because of what it says and
does." We're not talking about "pretty pictures," but the fact that
John resorts to such a blandishment suggests he falls squarely into
the category I complain about all the time-- accessibility advocates
who are actively hostile to visual design. Let me just mention again
that (a) that horse won't hunt, (b) WCAG WG has to work at designers'
level if it expects them to work at its level, and (c) many people
with disabilities have fully- or mostly-functional vision and not
only benefit from but *expect* good visual design.

>>  It's been addressed. A wide variety of PDFs

Everything we do here is about "some but not all." I guarantee you
that you can find at least one person with a disability online who
cannot read or use any specific page.

>>  can be made adequately
>>  accessible to many groups.
>But NOT ALL groups!

Universal accessibility is a myth. And you're being disingenuous in
your use of the word "groups." Do you define people who refuse to
update their technology to keep pace with updates in *accessible*
technology as such a "group"? That has nothing to do with disability.

>>>  Maybe I'm over-interpreting, but: I would class PDFs as non-text
>>  Except for all that text inside them.
>Precisely... "inside them".  But what if you cannot get "inside them"?

Every platform in common use has tools that can read PDFs. Plus you
can also Google them. And if you don't think Google is an adaptive
technology, you've been asleep.

>Jesper, if the author has given you permission to re-print the
>article (because the original is no longer being hosted by IBM) it
>shifts the responsibility to you to handle it properly (IMO).

The author retains _droit moral_ and may insist that the original
format be preserved. Absent written permission to adapt to another
format, in many countries you may in fact *not* alter the original,
as by transforming to HTML.

>Whining and moaning about the extra work is not good enough... It
>took exactly 40 minutes to re-convert that PDF to accessible,
>structurally intact HTML... I know, because I did it and timed it
>(www.wats.ca/reprints/jesper.html - this will not remain live past
>Aug. 21st, 2004 due to possible copyright infringements).

"Possible"? Indisputable.

OK, 40 minutes times how many documents in a company's archive?

Nobody, but nobody, has solved the problem of updating archived or
legacy "content." My recommendation has always been to set a schedule
of conversion and to respond to requests or complaints as soon as
they are received. This seems like a fair solution. Very rich
companies may merit different (i.e., accelerated) requirements.

By the way, your easy HTML version has invalid code (36 errors:
and would not pass Priority 2.

>In this concept, sure, PDF's are fine, useful and should continue to
>be made available. But arguing that the file format is universally

We aren't. There *is* no such thing.

>Providing content exclusively in PDF means "one or more groups will
>find it impossible to access information in the document."

Be careful how you define "groups."


At 11:50 -0400 2004.08.19, Access Systems wrote:
>>  i.e. same person on a different system can access the content.
>>Content is not the problem, it's the user's set up.
>?? ADA says for something to be accessible the individual cannot be
>required to purchase something that everyone else is not required to
>purchase. (28CFR36.301[c])  this is called "disparate treatment"

We're not talking about the ADA.

>  As far as I know emacspeak does read PDF, but I copied Raman to let
>him explain it since he wrote it.
>but will it read pdf in all cases.

Of course not in "all" cases. Some PDFs are inaccessible, as are some
Web pages.

>>   Whether the content is in PDF, HTML, SMIL,
>but the problem is GETTING TO the content. once a pdf is untangled

I wish you'd stop using these approximate and incorrect terms. You
don't have to "untangle" a PDF; it isn't a ball of string.

>>or whatever, there are still requirements for accessible content in
>>that format. For example, if there is an image in either of these
>>formats, then the content of the image needs to be made available,
>>for example via an alt attribute in HTML, etc.
>yes, and how do you put that into a pdf?

I see that, as feared, Access Systems works in complete ignorance of
accessible PDF.

>>  Claiming that if something doesn't run in Lynx makes it
>>inaccessible is misinformation,
>I beg to differ..

Time has marched on. A browser that can't use CSS is outdated. I use
one each and every day. It has many advantages. But it cannot be
considered the baseline.


At 16:54 +0000 2004.08.19,  David Poehlman wrote:
>1> much of pdf comes directly from paper.  It's scaneed and dumped
>directly into pdf.

That's increasingly rare. But it does seem to be the false
preconception held by many blind people and/or Lynx users on this
esteemed List.

>  if the pdf is truly textual in the first place and if and this is a
>big if, it has formatting intact which it almost never does, and

Acrobat 5 and later (and Acrobat 4 with a plug-in) can make sense
even of untagged PDFs that contain actual text, which is a very large
number of them. It'll read the text. It may not be pretty, but it'll

>If I am a customer of the us federal government and I use linux or
>dos or outspoken for the mac, I should not be denied access to
>information simply because of my choice or need of environment. This
>is accessibility.

Some parts of the U.S. federal government are mandated to place, for
example, forms online, but are also legally enjoined from altering
their appearance. That rules out HTML.


At 16:55 +0000 2004.08.19, RUST Randal  wrote:
>The data for many PDFs, especially reports, is stored in a database

I really don't think that's the case at all for "many" PDFs.

>The data is just extracted and turned into a PDF. Wouldn't it make
>sense to tell developers to give the user the choice of PDF or text
>version, and then generate content in the desired format?

Sure. Now give them ATAG-compliant tools to do it.

>In fact, a PDF always exists in some other format prior to being
>turned into a PDF,

That's true, but it is possible to directly generate a PDF from
scratch without an preliminary document.

>  and most, if not all of those applications allow for the file to be
>saved in many different formats which are more accessible than PDF.

No, that's false. Any application on Mac OS X can save in PDF, for
example. I strongly dispute the idea that typical OS X applications
can save in, for example, HTML.

>The point is, when a document is offered as a PDF, developers should
>be encouraged to provide the document in multiple formats, which is
>entirely reasonable (and pretty much what WCAG 1.0 says).

Time has marched on. PDF can be accessible unto itself.

Why isn't anybody making this argument about multimedia? Oh, but
that's what WCAG 1.0 tried to do-- in the Working Group's mania for
TEXT-ONLY ALL THE TIME, it insisted on "collated text transcripts"
and similar malarkey. But multimedia can carry its own accessibility
features, which the Working Group is growing to accept. What's the
difference? There isn't any.


     Joe Clark | joeclark@joeclark.org
     Accessibility <http://joeclark.org/access/>
     Expect criticism if you top-post
Received on Thursday, 19 August 2004 19:37:43 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:44 UTC