- From: <accessys@smart.net>
- Date: Sat, 2 Mar 2013 15:36:30 -0500 (EST)
- To: Ian Sharpe <themanxsharpy@gmail.com>
- cc: "'David Woolley'" <forums@david-woolley.me.uk>, w3c-wai-ig@w3.org
I think you may have hit the nail on the head, no one considers the community of people with disabilities worth spending the research and money on. what is avaliable is . cobbled together from some other use (pdf) . very expensive (Jaws) . or done by users (eMacspeak) as examples no one has really taken the project on as a serious project, the economic returns on investment are not "percieved" to be there Bob On Sat, 2 Mar 2013, Ian Sharpe wrote: > Date: Sat, 2 Mar 2013 20:18:54 -0000 > From: Ian Sharpe <themanxsharpy@gmail.com> > To: 'David Woolley' <forums@david-woolley.me.uk>, w3c-wai-ig@w3.org > Subject: RE: Accessible PDF Repair > Resent-Date: Sat, 02 Mar 2013 20:19:27 +0000 > Resent-From: w3c-wai-ig@w3.org > > I'm no expert in PDF accessibility, tagging etc. But having worked on facial > image recognition software over 15 years ago now and loosely followed > progress in this area, I am really surprised that current OCR technology > couldn't make at least a decent stab at automating the tagging process of > scanned documents. > > I do totally appreciate that there are going to be times when an automated > tagging approach might struggle, providing say alternative text for images > for example (although maybe even that is starting to become possible these > days), but surely it would be good enough to provide enough information to > significantly improve the accessibility of the untagged document? > > Is it simply the case that nobody has chosen to use todays scanning and > analysis technology to produce a tagged document or am I missing something? > > Apart from images, the only problem I can think of off the top of my head is > how OCR technology could work out where a link references, but maybe there > are other ways to obtain this information. > > As I said though, I'm not an expert in this area and am just curious to > understand the problem. > > Cheers > Ian > > > > > > -----Original Message----- > From: David Woolley [mailto:forums@david-woolley.me.uk] > Sent: 02 March 2013 09:47 > To: w3c-wai-ig@w3.org > Subject: Re: Accessible PDF Repair > > Lars Ballieu Christensen wrote: >> >> You may want to consider the automated PDF conversion features of >> RoboBraille. You can use the RoboBraille service to convert all types >> of pdf files into more accessible formats, including tagged pdf. >> > > Although there are heuristics that will often successfully detect > re-flowable text, and there are even reasonable heuristics for working > out word spaces in micro-spaced documents that didn't use the PDF > support for micro-spacing (most Windows generated PDF contains no spaces > and outputs printable characters without associating them into words and > with a move between each character), I don't believe the state of AI is > currently up to a level where it could properly tag a final form > document, unless it had a machine readable definition of the style sheet > and the document was properly authored to that style sheet. > > Note I don't mean a CSS style sheet; I mean a style I would be given to > a human author. Although the SS in CSS comes from that concept, the way > it is often used is not like the way that one would be used for a human > author. > > Even with a style sheet, one would not be able to distinguish between > the standard renderings of citation and emphasis, in Western languages, > so one would have to tag them presentationally, as italics. To do > otherwise, would require language understanding that goes beyond current > internet machine translation capabilities. > > I'd therefore take any claim to recover tagged PDF, from pure final form > PDF, with a pinch of salt. Basically, only humans can tag documents > with any reasonable level of reliability, which makes it expensive, and > is why documents which were not tagged properly when first written, are > unlikely to get properly tagged thereafter. > > Also, I haven't tried the tools, but if they work on PDFs marked as copy > and paste disallowed, I would have concerns that they may violate the > DMCA, and the equivalent UK, etc., copyright law provisions. > Accessibility interfaces tend to get some dispensation from copy > protection schemes on the understanding that they are only used to > create transient versions for the end user, not to extract the text into > a revisable form. > > > -- > David Woolley > Emails are not formal business letters, whatever businesses may want. > RFC1855 says there should be an address here, but, in a world of spam, > that is no longer good advice, as archive address hiding may not work. > > >
Received on Saturday, 2 March 2013 20:37:43 UTC