- From: Andrew Cunningham <andj.cunningham@gmail.com>
- Date: Tue, 5 Jan 2016 08:01:45 +1100
- To: Duff Johnson <duff@duff-johnson.com>
- Cc: WAI Interest Group <w3c-wai-ig@w3.org>
- Message-ID: <CAOUP6K=MBxjZ9M+cEQUDU+zOtS1K66Xy-bLQ-R1=2gJ6416F_Q@mail.gmail.com>
Thanks Duff, On 5 Jan 2016 7:25 am, "Duff Johnson" <duff@duff-johnson.com> wrote: > > Hi Andrew, > > I will leave the font-specific questions you asked to others; it’s not my area of expertise. > > > This leaves the possibility of ActualText. The most common use of ActualText I have seen is the generation and embedding of a text layer into a scanned PDF via OCR. > > Plenty of PDF generators will also deploy ActualText as a character replacement for rendered glyphs. In fact, ActualText is how (in PDF) one represents (inline) the exact text equivalent of a graphical illustration that has the appearance of text. So far as I am aware (please correct me if I am wrong) HTML does not provide any mechanism for this case. > Although ActualText isn't technically a character replacement for rendered glyphs .... there is nothing in the pdf spec that i noticed that can do that. ActualText would appear to be an alternative to the text in the PDF, or a textual representation of an image of text. Although I am using text loosely in this sense since a pdf seems to contain a collection of glyphs rather than encoded text per se, as I understand it. Additionally ActualText is plaintext so for some languages, scripts and scenarios ... the richtext or markedup text in source document may have to be edited, with bidi and other control characters added to the text. > > Assuming a PDF has both text and ActualText, which would be used by indexing, searching and accessibility software? Is there any software tools that would use ActualText in preference to the text in the PDF? > > It’s typical for PDF consuming tools that extract or process text to use ActualText. Software that cannot process ActualText cannot be claimed to support accessible (i.e., tagged) PDF. > > Duff. > >
Received on Monday, 4 January 2016 21:02:12 UTC