W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > January to March 2016

RE: PDF accessibility and complex script languages.

From: Jonathan Avila <jon.avila@ssbbartgroup.com>
Date: Tue, 5 Jan 2016 15:55:37 +0000
To: Andrew Kirkpatrick <akirkpat@adobe.com>, Andrew Cunningham <andj.cunningham@gmail.com>, "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
Message-ID: <BY2PR03MB2723827CE556AFD5B4C7AAC9BF30@BY2PR03MB272.namprd03.prod.outlook.com>
Ø  Assuming that the OCR’d text has been embedded in the file as invisible text, then that text is what should be corrected.  Trying to override it using anything else is incorrect and shouldn’t be done at a tagging level.
It seems that editing scanned text is much better in Acrobat XI – in prior versions the text was invisible and I had to copy and paste it out edit and paste it back in.  With Acrobat XI I am able to make some edits directly to the OCR text on-screen but the quality seems to degrade/change when editing and replacement characters – thus I am changing the visual appearance of the scanned image.  It is my understanding that changing the visual appearance of the scanned image might not be acceptable to some.  It would appear that Acrobat replaces the text you type with copied ore replicated characters that are based on the scanned image.  That is if I delete an “r” and then type an “r” it appears different than the “r” I replaced.

Sorry to go on about this – but I want to understand the requirements so I can make the correct recommendations in this area.

Jonathan

Jonathan Avila
Chief Accessibility Officer
SSB BART Group
jon.avila@ssbbartgroup.com
703.637.8957 (o)
Follow us: Facebook<http://www.facebook.com/#!/ssbbartgroup> | Twitter<http://twitter.com/#!/SSBBARTGroup> | LinkedIn<http://www.linkedin.com/company/355266?trk=tyah> | Blog<http://www.ssbbartgroup.com/blog> | Newsletter<http://eepurl.com/O5DP>

From: Andrew Kirkpatrick [mailto:akirkpat@adobe.com]
Sent: Tuesday, January 05, 2016 10:22 AM
To: Jonathan Avila; Andrew Cunningham; w3c-wai-ig@w3.org
Subject: Re: PDF accessibility and complex script languages.

If actual text can’t be used then I’d love to know where one should correct OCR’d text that was incorrectly identified by Adobe Acrobat during the OCR process but is not flagged as an OCR suspect/error.  Where might a user fix the OCR text?  Is there is some contents key in the tag editor where this can be corrected?

Assuming that the OCR’d text has been embedded in the file as invisible text, then that text is what should be corrected.  Trying to override it using anything else is incorrect and shouldn’t be done at a tagging level.
AWK
Received on Tuesday, 5 January 2016 15:56:11 UTC

This archive was generated by hypermail 2.3.1 : Friday, 29 January 2016 16:39:04 UTC