W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > January to March 2012

RE: Removing PDFs and accessibility

From: Ozi, Selim <sozi1@mscd.edu>
Date: Mon, 26 Mar 2012 10:36:21 -0600
To: Andrew Kirkpatrick <akirkpat@adobe.com>, "wed@csulb.edu" <wed@csulb.edu>, David Woolley <forums@david-woolley.me.uk>
CC: "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
Message-ID: <988AD2761168FB4BA162C5F96A122ADA2B40DFFF2C@E2K7VS.services.metro>
Great thread of information about Web, Accessibility,PDF.
My question is to Andrew:
Can Adobe, allow the user to  choose which PDF format to create /save/view? 
Something like below:
1- PDF / I  = image PDF
2- PDF/ O  = OCR
3- PDF/ TR = Tagged/touchup reading ordered

This way  developers/ Content providers, provide information to end user to choose the median if user would like to view this format with a screen reader or choose to go to structured HTML format?

Selim Özi
Accessbile technology Specialist
Access Center.
Metropolitan State University of Denver.

-----Original Message-----
From: Andrew Kirkpatrick [mailto:akirkpat@adobe.com] 
Sent: Monday, March 26, 2012 9:49 AM
To: wed@csulb.edu; David Woolley
Cc: w3c-wai-ig@w3.org
Subject: RE: Removing PDFs and accessibility

Unfortunately the original post doesn't allow comments.  My gripe with this post is that it makes many false claims and uses the false claims as evidence to support a conclusion which may be true, but there is no actual data or scientific rigor offered, which makes this interesting as anecdotal data, but nothing more.  I'd like to see more information on the study performed, and offer the following questions to consider.

>From the article, with comments:
Mark said major disadvantages of PDFs include:
*	not showing up in search results
PDF documents do show up in search results.  Google and Bing both index and include PDF documents in search results.

*	failing Australian Human Rights Commission requirements for being accessible to people with a disability, such as compatibility with screen readers
Differences do exist, to be sure, but NVDA, as a free screen reader on Windows provides nearly the same level of support as JAWS (support for headings is one of the main issues remaining and I expect we'll see that addressed soon).  VoiceOver with PDF documents on the Mac is not as good as the Windows options but the document content can be read and used.  The level of support is better than what is provided by a text only or RTF document which the AHRC does suggest is sufficient.
I realize that this department is in the state government, but it is worth noting that AGIMO in the federal government agrees that well-authored PDF documents can meet WCAG 2.0 and can be used within the government to comply with the National Transition Strategy:

(http://agimo.govspace.gov.au/2012/01/12/release-of-wcag-2-0-techniques-for-pdf/comment-page-1/#comment-5632) "As stated, the PDF Sufficient Techniques are now available, so technically an agency can rely on PDF by using the WCAG 2.0 PDF Sufficient Techniques and all applicable General Techniques, and will be considered to be complying with the NTS. This addresses one of the findings of our PDF study by ensuring the design of the PDF file is optimised for accessibility."

More on this in a bit...

*	penalising people who have slow internet connections
*	often extremely large document sizes.
These are really the same point, so I'll address them together.  Some PDF documents do get rather large, some outrageously so.  However, PDF documents can and should be authored to be as light as possible, so while it may be that a 300 page report is large no matter what an author does, PDF documents in general need not be bloated in size and authors who are tending to their work can easily avoid this.  Adobe Acrobat also offers a batch process which can watch a specific folder and when PDF documents are added there it can take the steps to reduce the file size automatically if desired.  Others have commented on the convenience of PDF documents for users also, so at a minimum offering a PDF document for some documents can be viewed as helping some users. 

Back to the main question:  Does replacing PDF documents with HTML documents increase web traffic?   I don't know the answer, but I am certain that the answer is not as simple as a quick look at the server log data.  There are complicated questions to be asked:

1)	were the PDF documents that were replaced built as tagged PDF documents to maximize their accessibility?
2)	How much of the additional traffic was bots?   Give a recent study on the amount of internet traffic that is non-human (http://www.itproportal.com/2012/03/14/51-internet-traffic-non-human/#ixzz1p7FFrR84) and the broad introduction of new pages and links I wonder whether a percentage that is greater than the 51% cited in the Incapsula report because spiders and other bots may be exploring the new pages.  (disclaimer - I haven't read the Incapsula report in any depth and can't say whether it is accurate or whether there are reasons that it may not be similar in the Victoria DPI case).
3)	What methodology for measuring the results was used?  If it is just hits on a page, it might make sense that going from 6000 pages and 9000 PDF files (15K URI) to 22000 HTML pages would result in a larger number of hits.  Some quick "back of the envelope" math shows that there are now 1.47 times the number of indexable pages now and the number of hits has risen by a factor of 1.38.
4)	Is it possible to review a collection of 10-20 representative PDF documents and the HTML analogs for them and see how the stats for those specific documents break down?  That would be interesting.

I'm sure that there are other interesting questions, but that's a start.

To the question of whether you should take this approach and replace your PDF documents with HTML files - maybe you should, but I'm not convinced that the hit count is a reason that you can depend on.  If you are hearing from your users that they prefer HTML files over PDF, then offer HTML.  If you are finding that maintenance is easier with another format, use that other format.  There are many reasons why you may want to offer HTML documents, but you should also recognize that there are valid reasons for using PDF documents, and if you find that these reasons make sense for you, use PDF.  But, when you do use PDF, follow best practices for making sure the PDF documents meet WCAG 2.0.


Andrew Kirkpatrick
Group Product Manager, Accessibility
Adobe Systems 


-----Original Message-----
From: Wayne Dick [mailto:wayneedick@gmail.com]
Sent: Sunday, March 25, 2012 2:54 PM
To: David Woolley
Cc: w3c-wai-ig@w3.org
Subject: Re: Removing PDFs and accessibility

Just making an attempt to move away from PDF as a system to view web content is great move forward.  It recognizes the issue that PDF is a poor online reading medium for many people with visual impairments.
Thank you Cosmic Muffin.

The primary application will be in the area of content meant for reading.  When article is written in PDF it generally increases the workload for reading on line, especially for a person with low vision.
 This generally involves a significant change in workload.  Since most sighted people just print PDF articles, this introduces a major inequality of work for people with full sight vs. people with partial sight.

The ability to obtain high quality will be the trick.  The tag spaces are not isomrphic, and tagged PDF enables meaningful text styling to be embedded in blocks of untagged data.  As such I do not see a programatically determined method of translation existing.  However a good heuristic will probably suffice.

Thanks for the article, good luck Victoria.

Wayne Dick

On 3/25/12, David Woolley <forums@david-woolley.me.uk> wrote:
> David Woolley wrote:
>> Incidentally, I have often sought out PDFs because they are not 
>> fragmented into pages,
> The big problem I often find with lots of small hyperlinked pages, on 
> sites (typically governmental, or software support) that should be 
> information rich, is that one ends up going round circles, never 
> actually getting to the detail you want.  I suspect that is often 
> because that level of detail just does not exist, but unless one maps 
> out the whole site and proves that you have seen all the pages, one 
> can never be sure of that.
> A single, linearised, document makes it much easier for the reader to 
> be sure that information is not present and makes it much harder for 
> the author to avoid answering difficult questions by just hyperlinking 
> you backwards and forwards.
Received on Monday, 26 March 2012 16:37:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:36:39 UTC