W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > January to March 2016

Re: PDF accessibility and complex script languages.

From: Duff Johnson <duff@duff-johnson.com>
Date: Mon, 4 Jan 2016 15:25:39 -0500
Cc: WAI Interest Group <w3c-wai-ig@w3.org>
Message-Id: <531FBEAB-E6E2-491E-A134-A7C2B2CF103D@duff-johnson.com>
To: Andrew Cunningham <andj.cunningham@gmail.com>
Hi Andrew,

I will leave the font-specific questions you asked to others; it’s not my area of expertise.

> This leaves the possibility of ActualText. The most common use of ActualText I have seen is the generation and embedding of a text layer into a scanned PDF via OCR.

Plenty of PDF generators will also deploy ActualText as a character replacement for rendered glyphs. In fact, ActualText is how (in PDF) one represents (inline) the exact text equivalent of a graphical illustration that has the appearance of text. So far as I am aware (please correct me if I am wrong) HTML does not provide any mechanism for this case.

> Assuming a PDF has both text and ActualText, which would be used by indexing, searching and accessibility software? Is there any software tools that would use ActualText in preference to the text in the PDF?

It’s typical for PDF consuming tools that extract or process text to use ActualText. Software that cannot process ActualText cannot be claimed to support accessible (i.e., tagged) PDF.

Received on Monday, 4 January 2016 20:26:10 UTC

This archive was generated by hypermail 2.3.1 : Friday, 29 January 2016 16:39:04 UTC