Re: PDF accessibility and complex script languages. from Andrew Cunningham on 2016-01-04 (w3c-wai-ig@w3.org from January to March 2016)

From: Andrew Cunningham <andj.cunningham@gmail.com>
Date: Tue, 5 Jan 2016 08:01:45 +1100
To: Duff Johnson <duff@duff-johnson.com>
Cc: WAI Interest Group <w3c-wai-ig@w3.org>
Message-ID: <CAOUP6K=MBxjZ9M+cEQUDU+zOtS1K66Xy-bLQ-R1=2gJ6416F_Q@mail.gmail.com>

Thanks Duff,

On 5 Jan 2016 7:25 am, "Duff Johnson" <duff@duff-johnson.com> wrote:
>
> Hi Andrew,
>
> I will leave the font-specific questions you asked to others; it’s not my
area of expertise.
>
> > This leaves the possibility of ActualText. The most common use of
ActualText I have seen is the generation and embedding of a text layer into
a scanned PDF via OCR.
>
> Plenty of PDF generators will also deploy ActualText as a character
replacement for rendered glyphs. In fact, ActualText is how (in PDF) one
represents (inline) the exact text equivalent of a graphical illustration
that has the appearance of text. So far as I am aware (please correct me if
I am wrong) HTML does not provide any mechanism for this case.
>

Although ActualText isn't technically a character replacement for rendered
glyphs .... there is nothing in the pdf spec that i noticed that can do
that.

ActualText would appear to be an alternative to the text in the PDF, or a
textual representation of an image of text. Although I am using text
loosely in this sense since a pdf seems to contain a collection of glyphs
rather than encoded text per se, as I understand it.

Additionally ActualText is plaintext so for some languages, scripts and
scenarios ... the richtext or markedup text in source document may have to
be edited, with bidi and other control characters added to the text.

> > Assuming a PDF has both text and ActualText, which would be used by
indexing, searching and accessibility software? Is there any software tools
that would use ActualText in preference to the text in the PDF?
>
> It’s typical for PDF consuming tools that extract or process text to use
ActualText. Software that cannot process ActualText cannot be claimed to
support accessible (i.e., tagged) PDF.
>
> Duff.
>
>

Received on Monday, 4 January 2016 21:02:12 UTC