Re: PDF accessibility and complex script languages. from Olaf Drümmer on 2016-01-06 (w3c-wai-ig@w3.org from January to March 2016)

From: Olaf Drümmer <olaflist@callassoftware.com>
Date: Wed, 6 Jan 2016 01:58:46 +0100
To: Andrew Kirkpatrick <akirkpat@adobe.com>
Cc: Olaf Drümmer <olaflist@callassoftware.com>, Andrew Cunningham <andj.cunningham@gmail.com>, w3c WAI List <w3c-wai-ig@w3.org>
Message-Id: <BB1F873D-6FAD-4D27-A04E-30CAB11B5174@callassoftware.com>

Hi Andrew,

> On 05.01.2016, at 02:41, Andrew Kirkpatrick <akirkpat@adobe.com <mailto:akirkpat@adobe.com>> wrote:
> 
> ActualText is NEVER EVER EVER used to represent the results of OCR – that would be a violation of the standard.

I’d love to learn how you come to this statement. AFAICT it can’t be derived from any of the PDF standards I know.

Just to make an extreme point: if each and every (recognized) character in an OCR-ed document would be represented by an ActualText attribute, on the formal level of applicable standards (PDF per ISO 32000-1 or PDF/UA per ISO 14289-1), nothing would be in violation of the applicable standards. Whether using such an approach makes any sense is a completely different story.

Olaf

Received on Wednesday, 6 January 2016 00:59:12 UTC