W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 2000

RE: PDF Alternatives?

From: Dave J Woolley <DJW@bts.co.uk>
Date: Wed, 2 Aug 2000 20:29:47 +0100
Message-ID: <81E4A2BC03CED111845100104B62AFB58248C6@stagecoach.bts.co.uk>
To: "'w3c-wai-ig@w3.org'" <w3c-wai-ig@w3.org>
> From:	Waddell, Cynthia [SMTP:cynthia.waddell@ci.sj.ca.us]
> 
> The problem with PDF at this time is that screenreaders are unable to read
> the text as well as fill in the forms of PDF documents.  This is the basis
	[DJW:]  
	I think I really need longer than I can spare to cover this
	well, but, whilst I accept that a well designed HTML document
	will generally be better than current generation PDF, I think:

	- there is a tendency to compare HTML as Tim Berners-Lee 
	intended it with PDF in real life;

	- to confuse tools and formats;

	- to not consider the nature of the source material and
	the clerical procedures associated with it.

	On the first, most people, including, I suspect, many people 
	interested in communicating information, treat HTML as a
	WYSIWYG language.  As its not, the result of trying to force
	it into that mould can be worse than the result of using a
	language that is intended for page layout.

	On tools versus formats.  The PDF format is actually designed
	to make linear reading of text quite easy, and I doubt that 
	much is needed to handle forms in a screen reader context.  However,
	it is possible that the reading tools don't give adequate 
	interfaces for screen readers, and it is certainly true that
	most of the authoring tools (i.e. standard word processors)
	make finding word boundaries difficult, by placing characters
	individually.

	(Even so, if you have material in PostScript or paper, converting
	the PostScript to PDF will produce a document with 100% correct
	character identification, whereas using OCR on the paper 
	document will misread many characters.)

	As to source material.  If you only have hard copy and you
	have an imperative to reproduce it accurately, you would use
	GIF with HTML in the contexts where you would use scanned 
	material with PDF.  PDF can actually do better by matching the 
	OCRed text with the image.  Basically, before worrying about
	HTML versus PDF, you must first convert from paper to machine
	readable documents and then to electronic submission.


> Adobe has committed to finding ways to incorporate structure into a PDF
> document upon creation and we all welcome that effort.  The first website
> 
	[DJW:]  
	The current PDF specification allows text to be annotated
	with structure information, however, as well as having the
	tools to create and use this, you also need people who can 
	think other than WYSIWYG; they are very rare.  You really only
	need to undo damage from printer driver microspacing in order
	to recover the contents of textual documents to the same
	quality as the average <font face...><br>++ flat HTML, that most
	people write.

[DJW:]  Basically, in many contexts, plain text is the most
accessible format, in some you need features from HTML or
other structural markup languages, and in some, PDF may be the
most practical format, without expending a lot of skilled 
labour on reworking clerical procedures and marking up 
documents.  Government funded bodies tend to have limited 
amounts of this; commercial bodies look for a return on 
investment that exceeds that they could get by using those
resources in other ways; and people in between (like "agencies"
in the UK sense) tend to work like commercial organisations, but
minimising costs, rather than maximising profits. 

++ Even with fully revisable WP  documents, it is not unknown
for professional people to tab round the end of line to get
a new line.

-- 
--------------------------- DISCLAIMER ---------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of BTS.
Received on Wednesday, 2 August 2000 15:29:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:13:49 GMT