- From: David Woolley <forums@david-woolley.me.uk>
- Date: Sat, 11 Feb 2012 10:49:32 +0000
- To: w3c-wai-ig@w3.org
Adam Cooper wrote: > The question is not whether it is possible, but why would you bother. > The majority of PDFs on the web are derived from electronic source > documents, do not utilise the content security features offered by the > platform, and creating accessible PDFs is time-consuming, incurs time In principle, mechanical construction of tagged PDF from HTML should be very easy, once you have a Postscript renderer for HTML, as tagged PDF is essentially a standard PDF overlayed with a description of the corresponding HTML4 structure. It is marginally more likely that an HTML original will have proper semantic markup than that a Word document has been styled properly (i.e. both are rather unlikely in the real world). > and monetary costs, and is beyond the skill level of most casual content > creators, so I struggle to find compelling reasons why there is a need As is creating accessible HTML. It doesn't take a lot of skill, but authors make the task difficult by concentrating on presentation. The skill is in creating the accessible document whilst maintaining the intended visual appearance. > to use PDF at all, especially when there are tools and methods in > existing non-proprietary technologies such as (X)HTML, CSS, and JS etc. > which offer comparative content securing and (print) formatting > functionality. Breaking the securing of PDF takes a certain amount of technical skill, and/or specialist tools. Breaking the securing of HTML is simply a case of turning off scripting. As to formatting. My view is that most of the accessibility problems with real world HTML come from trying to treat it as a page description language. Using tagged PDF would at least be honest, and, because one doesn't have worry about the constraints imposed by a structural semantics language when creating the presentation, ought to produce much more resillient documents. Of course, this is an argument for not using HTML as an intermediate format. > > *From:* Tanguy.Loheac@sanofi.com [mailto:Tanguy.Loheac@sanofi.com] > anyone had the chance to expirement a java library that would convert > accessible (x)html page to an accessible pdf document? The Java constraint is too restrictive. I think all fully Java HTML renderers reached a dead end. The only specific HTML to PS tool (as used to create the HTML4 specifications in PDF) was in Perl, but that was written before tagged PDF. A tool to create tagged PDF from HTML really needs to be based on one of the major HTML rendering engines, which means it should be in C or C++. -- David Woolley Emails are not formal business letters, whatever businesses may want. RFC1855 says there should be an address here, but, in a world of spam, that is no longer good advice, as archive address hiding may not work.
Received on Saturday, 11 February 2012 10:50:09 UTC