- From: Andrew Cunningham <andj.cunningham@gmail.com>
- Date: Thu, 7 Jan 2016 12:57:07 +1100
- To: Duff Johnson <duff@duff-johnson.com>
- Cc: "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
- Message-ID: <CAOUP6Kmb-bmT7BDxY+UMj0wbFhbqQ+QQ+5MxNGrLsOwb_=JShw@mail.gmail.com>
Hi Duff, The scenario I was discussing not use of ActualText for images. But its use for the text in those languages and writing scripts that are ill supported by the PDF character model. It is important to realise that PDF uses a glyph based model more akin to pseudo-Unicode font solutions than to Unicode font solutions. OpenType features that do not modify cmap entries and reordered glyph sequences are particularly problematic. When I get back to office I will create sample PDF files with Burmese syllables in Unicode , using various OpenType fonts (using a selection of fonts using the mymr and mym2 OpenType script codes). So far my tests have been with mymr style fonts, but will also test the newer mym2 fonts from Microsoft and Google. Mymr is problematic, but uses legacy approach, using rlig, clig, and liga features. mym2 will be interesting to test. Mym2 is the way Myanmar fonts should be developed and implemented in Opentype, while mymr were hacks working within the restrictions the the DFLT script in rendering engines. I suspect that PDF files will have greater problems with mym2 based fonts, but need to test it. But recapping .. my concerns are related to accessibility of text in PDFs written in languages that use complex scripts. ActualText seems to be the only way to get meaningful Unicode into the PDF. But if as Duff indicates, the actual use of ActualText is at the discretion of implementers, then I think we have an accessibility issue that PDF/UA inadequately addresses. The reality I suspect is that any PDF in certain languages, as things current stand, can not be guaranteed to be accessible even if all other WCAG requirements are met, since the most fundamental issue, the text itself, is at question. Andrew On Thursday, 7 January 2016, Duff Johnson <duff@duff-johnson.com> wrote: > Hi Andrew, > >> The results were >> >> 2) exporting as text file - generated text file used the visible text in PDF, it did not use the contents of the ActualText tags >> >> 3) cutting and pasting - pasted text was based on the visible text in PDF, it did not use the contents of the ActualText tags > > Do consider that these are very distinct functions, and that consuming implementations are within their rights to ignore ActualText if it’s not appropriate to the user’s needs. > > For example, when exporting a document to HTML it may or may not be appropriate to replace images with ActualText. Maybe the images themselves should be exported… (I am leaving aside the question of how to represent ActualText in HTML… that’s for another day…) > > On the other hand, when a search-engine consumes PDF, ActualText should *always* be used, otherwise there’s nothing to index… :-) > > Duff. > > >
Received on Thursday, 7 January 2016 01:57:37 UTC