Re: The murky intersection of accessibility and internationalization from chaals@yandex-team.ru on 2017-01-09 (w3c-wai-ig@w3.org from January to March 2017)

From: <chaals@yandex-team.ru>
Date: Mon, 09 Jan 2017 07:10:26 +0100
To: Andrew Cunningham <andj.cunningham@gmail.com>, WAI Interest Group <w3c-wai-ig@w3.org>
Message-Id: <333201483942226@webcorp01f.yandex-team.ru>
Hi Andrew,

I suggest you look at the "understanding 3.1.1" section - https://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-doc-lang-id.html

It says, right at the top, 

"The intent of this Success Criterion is to ensure that content developers provide information in the Web page that user agents need to present text and other linguistic content correctly. Both assistive technologies and conventional user agents can render text more accurately when the language of the Web page is identified. Screen readers can load the correct pronunciation rules. Visual browsers can display characters and scripts correctly. Media players can show captions correctly. As a result, users with disabilities will be better able to understand the content."

If the text itself doesn't match the language, then fails to meet the intent - i.e. it is not fit for purpose.

Likewise, in Understanding 1.1.1 - https://www.w3.org/TR/UNDERSTANDING-WCAG20/text-equiv.html

it says

"The purpose of this guideline is to ensure that all non-text content is also available in text. "Text" refers to electronic text, not an image of text. Electronic text has the unique advantage that it is presentation neutral. That is, it can be rendered visually, auditorily, tactilely, or by any combination. As a result, information rendered in electronic text can be presented in whatever form best meets the needs of the user. It can also be easily enlarged, spoken aloud so that it is easier for people with reading disabilities to understand, or rendered in whatever tactile form best meets the needs of a user."

So anything that is written using a visual trick to replace the underlying characters with other glyphs isn't "text", in the meaning of WCAG, and requires an alternative. The simplest one for the cases you describe would of course be proper unicode text…

This issue is also noted in the glossary definition of "non-text content": https://www.w3.org/TR/UNDERSTANDING-WCAG20/text-equiv-all.html#non-text-contentdef

But I agree that in terms of Success Criteria this isn't immediately obvious. Since justifying the jobs of accessibility consultants as the only people who can understand WCAG isn't a goal, I think it would be good to think about how we could clarify this in WCAG.

aside …
I worked on an example last century, where a group of aboriginal languages were written using a font so that various punctuation characters would be visually represented as the right glyph - but since the underlying word would have punctuation marks in place of some letters, they could not be presented by a screen reader or represented accurately in a font designed for e.g. simplifying reading for people with dyslexia. If I recall correctly, an added problem was not having language code.

cheers

Chaals

09.01.2017, 04:07, "Andrew Cunningham" <andj.cunningham@gmail.com>:
> Hi everyone,
>
> At the moment I am doing some work for a Victorian Government agency,
> the focus is on web internationalisation, specifically the integration
> of government information translated into community languages (the
> languages spoken and read by migrant and refugee communities, where
> members of these communities may possess limited fluency with
> English).
>
> Two common community languages used by our state government are
> Burmese and Sgaw Karen. Content can be found in HTML or PDF files. The
> Unicode Consortium's Myanmar Scripts and Languages FAQ [1] provides
> some background information. Although the FAQ understates the
> complexity of the problem. Most Burmese translations are not provided
> in Unicode. Burmese is usually provided as a file using the Zawgyi
> pseudo-Unicode (or adhoc) encoding.
>
> Sgaw Karen is usually provided in a number of 8-bit legacy encodings
> or occasionally using a pseudo-Unicode (or adhoc) font.
>
> Web content in a pseudo-Unicode encoding identifies itself as UTF-8.
> Web content in an 8-bit legacy encoding declares itself as ISO/IEC
> 8859-1 or Windows 1252 encodings. From an internationalisation
> perspective, things are quite simple, the content should be in
> Unicode. Although I can just come out and say that, I also need to be
> able to justify the internationalisation decisions with reference to
> accessibility considerations. Web accessibility is important for state
> government agencies and departments. They aim to meet WCAG 2.0 AA
> requirements.
>
> My reading of WCAG 2.0 recommendation is that the encoding issues for
> Burmese and Sgaw Karen directly impact principle 3 and 4, and that
> non-Unicode content would be considered inaccessible. But there are no
> specific guidelines that are relevant, and nothing for documents to
> comply with, other than a generic principle? Would this be correct?
>
> In HTML, it makes sense to require Unicode for Burmese and Sgaw Karen
> content, but there is no explicit accessibility requirement to do so.
>
> PDFs are a more complex problem. The ToUnicode mapping in Burmese and
> Sgaw Karen PDF files using pseudo-Unicode or legacy fonts is
> essentially useless. Such fonts work by deliberately miss-declaring
> the glyph to codepoint correspondences. Using Unicode only also
> doesn't get us all the way to an accessible document for these
> languages, since PDF specifications can not handle all aspects of
> resolving glyph to codepoint for font technologies employed for
> complex scripts (writing systems). The way around this may be to make
> use of the ActualText attributes.
>
> In PDF Techniques for WCAG 2.0 [2], PDF7 is explicit referring to use
> of ActualText via OCR where the PDF contains images of text rather
> than a text layer. But I assume that ActualText was the appropriate
> way forward in a pseudo-Unicode or legacy font scenario?
>
> Use of unsupported legacy encodings within PDF files has been fairly
> common for languages written in complex scripts due to historical
> limitations in typesetting applications to handle opentype features
> required by complex script languages. So it is a wider problem than
> just the two languages I have been discussing.
>
> Do my assumptions sound reasonable from an accessibility perspective?
> Or are there alternative approaches from a accessibility perspective
> you think I may have overlooked? Or have I totally lost the plot?
>
> Feedback and input welcome.
>
> Andrew
>
> [1] http://www.unicode.org/faq/myanmar.html
> [2] https://www.w3.org/TR/2014/NOTE-WCAG20-TECHS-20140408/pdf.html
>
> Andrew Cunningham
> andj.cunningham@gmail.com

-- 
Charles McCathie Nevile - standards - Yandex
chaals@yandex-team.ru - - - Find more at http://yandex.com
Received on Monday, 9 January 2017 06:11:01 UTC