Hi Andrew,
> On 05.01.2016, at 02:41, Andrew Kirkpatrick <akirkpat@adobe.com <mailto:akirkpat@adobe.com>> wrote:
>
> ActualText is NEVER EVER EVER used to represent the results of OCR – that would be a violation of the standard.
I’d love to learn how you come to this statement. AFAICT it can’t be derived from any of the PDF standards I know.
Just to make an extreme point: if each and every (recognized) character in an OCR-ed document would be represented by an ActualText attribute, on the formal level of applicable standards (PDF per ISO 32000-1 or PDF/UA per ISO 14289-1), nothing would be in violation of the applicable standards. Whether using such an approach makes any sense is a completely different story.
Olaf