Re: PDF in WCAG 2 from Joe Clark on 2004-08-19 (w3c-wai-ig@w3.org from July to September 2004)

From: Joe Clark <joeclark@joeclark.org>
Date: Thu, 19 Aug 2004 15:12:39 -0400
To: WAI-IG <w3c-wai-ig@w3.org>
Message-Id: <a061104aabd4a8dddc546@[192.168.1.100]>
At 21:29 -0400 2004.08.18, Access Systems wrote:
>>  Every platform in common use (including Linxux) can read PDFs, 
>>including via open-source tools. Plus, I dunno, have you considered 
>>Googling?
>
>Linux can read pdf VISUALLY but not with text browsers,  sure in 
>Mozilla or others you can see pdf's but how do you do it with a 
>screen reader.

I don't know what you mean. The few text-only browsers in existence 
(I use Lynx ever day) have not been upgraded to read PDFs. That's 
nobody's problem but the authors of text-only browsers-- and their 
users.

There may be some kind of plug-in for Moz that lets it read PDFs 
right in the browser (like Schubert-It), but I don't have one.

"How do you do it with a screen reader?" At present, you use Jaws, 
Window-Eyes, or Hal. I infer, based on publicly-released information, 
that Apple VoiceOver-- freely built into OS X 10.4, according to the 
information-- will read PDFs.

--

At 09:57 -0400 2004.08.19, John Foliot - WATS.ca wrote:
>  Some members of the Working Group fail to understand that 
>presentation *is* content.
>
>Some members of this list fail to accept that for many users, 
>presentation is a structural concept,

That's an oxymoron.

>not a "pretty picture" concept.
>
>The basic premise of the web, and accessible web development is, 
>was, and (IMHO) should always be the separation of display and 
>content (unless of course the whole idea of CSS, semantic web, 
>structural development, etc. is just a crock and the web really is 
>about pretty pictures)

Oh, stop. CSS can generate content. Structural markup is based *on* 
content: "This chunk of text is a heading because of what it says and 
does." We're not talking about "pretty pictures," but the fact that 
John resorts to such a blandishment suggests he falls squarely into 
the category I complain about all the time-- accessibility advocates 
who are actively hostile to visual design. Let me just mention again 
that (a) that horse won't hunt, (b) WCAG WG has to work at designers' 
level if it expects them to work at its level, and (c) many people 
with disabilities have fully- or mostly-functional vision and not 
only benefit from but *expect* good visual design.

>>  It's been addressed. A wide variety of PDFs
>
>(READ "SOME BUT NOT ALL")

Everything we do here is about "some but not all." I guarantee you 
that you can find at least one person with a disability online who 
cannot read or use any specific page.

>>  can be made adequately
>
>(READ "ALMOST BUT NOT QUITE")
>
>>  accessible to many groups.
>
>But NOT ALL groups!

Universal accessibility is a myth. And you're being disingenuous in 
your use of the word "groups." Do you define people who refuse to 
update their technology to keep pace with updates in *accessible* 
technology as such a "group"? That has nothing to do with disability.

>>>  Maybe I'm over-interpreting, but: I would class PDFs as non-text content
>
>>  Except for all that text inside them.
>
>Precisely... "inside them".  But what if you cannot get "inside them"?

Every platform in common use has tools that can read PDFs. Plus you 
can also Google them. And if you don't think Google is an adaptive 
technology, you've been asleep.

>Jesper, if the author has given you permission to re-print the 
>article (because the original is no longer being hosted by IBM) it 
>shifts the responsibility to you to handle it properly (IMO).

The author retains _droit moral_ and may insist that the original 
format be preserved. Absent written permission to adapt to another 
format, in many countries you may in fact *not* alter the original, 
as by transforming to HTML.

>Whining and moaning about the extra work is not good enough... It 
>took exactly 40 minutes to re-convert that PDF to accessible, 
>structurally intact HTML... I know, because I did it and timed it 
>(www.wats.ca/reprints/jesper.html - this will not remain live past 
>Aug. 21st, 2004 due to possible copyright infringements).

"Possible"? Indisputable.

OK, 40 minutes times how many documents in a company's archive?

Nobody, but nobody, has solved the problem of updating archived or 
legacy "content." My recommendation has always been to set a schedule 
of conversion and to respond to requests or complaints as soon as 
they are received. This seems like a fair solution. Very rich 
companies may merit different (i.e., accelerated) requirements.

By the way, your easy HTML version has invalid code (36 errors: 
<http://validator.w3.org/check?uri=http://www.wats.ca/reprints/jesper.html>) 
and would not pass Priority 2.

>In this concept, sure, PDF's are fine, useful and should continue to 
>be made available. But arguing that the file format is universally 
>accessible

We aren't. There *is* no such thing.

>Providing content exclusively in PDF means "one or more groups will 
>find it impossible to access information in the document."

Be careful how you define "groups."

--

At 11:50 -0400 2004.08.19, Access Systems wrote:
>>  i.e. same person on a different system can access the content. 
>>Content is not the problem, it's the user's set up.
>
>?? ADA says for something to be accessible the individual cannot be 
>required to purchase something that everyone else is not required to 
>purchase. (28CFR36.301[c])  this is called "disparate treatment"

We're not talking about the ADA.

>  As far as I know emacspeak does read PDF, but I copied Raman to let 
>him explain it since he wrote it.
>
>but will it read pdf in all cases.

Of course not in "all" cases. Some PDFs are inaccessible, as are some 
Web pages.

>>   Whether the content is in PDF, HTML, SMIL,
>
>but the problem is GETTING TO the content. once a pdf is untangled

I wish you'd stop using these approximate and incorrect terms. You 
don't have to "untangle" a PDF; it isn't a ball of string.

>>or whatever, there are still requirements for accessible content in 
>>that format. For example, if there is an image in either of these 
>>formats, then the content of the image needs to be made available, 
>>for example via an alt attribute in HTML, etc.
>
>yes, and how do you put that into a pdf?

I see that, as feared, Access Systems works in complete ignorance of 
accessible PDF.

>>  Claiming that if something doesn't run in Lynx makes it 
>>inaccessible is misinformation,
>
>I beg to differ..

Time has marched on. A browser that can't use CSS is outdated. I use 
one each and every day. It has many advantages. But it cannot be 
considered the baseline.


--

At 16:54 +0000 2004.08.19,  David Poehlman wrote:
>1> much of pdf comes directly from paper.  It's scaneed and dumped 
>directly into pdf.

That's increasingly rare. But it does seem to be the false 
preconception held by many blind people and/or Lynx users on this 
esteemed List.

>  if the pdf is truly textual in the first place and if and this is a 
>big if, it has formatting intact which it almost never does, and 
>then,

Acrobat 5 and later (and Acrobat 4 with a plug-in) can make sense 
even of untagged PDFs that contain actual text, which is a very large 
number of them. It'll read the text. It may not be pretty, but it'll 
read.

>If I am a customer of the us federal government and I use linux or 
>dos or outspoken for the mac, I should not be denied access to 
>information simply because of my choice or need of environment. This 
>is accessibility.

Some parts of the U.S. federal government are mandated to place, for 
example, forms online, but are also legally enjoined from altering 
their appearance. That rules out HTML.

--

At 16:55 +0000 2004.08.19, RUST Randal  wrote:
>The data for many PDFs, especially reports, is stored in a database somewhere.

I really don't think that's the case at all for "many" PDFs.

>The data is just extracted and turned into a PDF. Wouldn't it make 
>sense to tell developers to give the user the choice of PDF or text 
>version, and then generate content in the desired format?

Sure. Now give them ATAG-compliant tools to do it.

>In fact, a PDF always exists in some other format prior to being 
>turned into a PDF,

That's true, but it is possible to directly generate a PDF from 
scratch without an preliminary document.

>  and most, if not all of those applications allow for the file to be 
>saved in many different formats which are more accessible than PDF.

No, that's false. Any application on Mac OS X can save in PDF, for 
example. I strongly dispute the idea that typical OS X applications 
can save in, for example, HTML.

>The point is, when a document is offered as a PDF, developers should 
>be encouraged to provide the document in multiple formats, which is 
>entirely reasonable (and pretty much what WCAG 1.0 says).

Time has marched on. PDF can be accessible unto itself.

Why isn't anybody making this argument about multimedia? Oh, but 
that's what WCAG 1.0 tried to do-- in the Working Group's mania for 
TEXT-ONLY ALL THE TIME, it insisted on "collated text transcripts" 
and similar malarkey. But multimedia can carry its own accessibility 
features, which the Working Group is growing to accept. What's the 
difference? There isn't any.


-- 

     Joe Clark | joeclark@joeclark.org
     Accessibility <http://joeclark.org/access/>
     Expect criticism if you top-post
Received on Thursday, 19 August 2004 19:13:16 UTC