RE: Re: The murky intersection of accessibility and internationalization from Sean Murphy (seanmmur) on 2017-01-10 (w3c-wai-ig@w3.org from January to March 2017)

From: Sean Murphy (seanmmur) <seanmmur@cisco.com>
Date: Tue, 10 Jan 2017 00:53:34 +0000
To: Andrew Cunningham <andj.cunningham@gmail.com>, WAI Interest Group <w3c-wai-ig@w3.org>
Message-ID: <cb59ffbbc1054fb1bc0850e013ecb73c@XCH-RCD-001.cisco.com>
Would this not fall within the vendor of the browser, PDF viewer or produc that creates the content or even the vendor OS?

Sean Murphy
Accessibility Software engineer
seanmmur@cisco.com
Tel: +61 2 8446 7751       Cisco Systems, Inc.
The Forum 201 Pacific Highway
ST LEONARDS
2065
Australia
cisco.com
 Think before you print.
This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

From: Andrew Cunningham [mailto:andj.cunningham@gmail.com]
Sent: Tuesday, 10 January 2017 11:40 AM
To: WAI Interest Group <w3c-wai-ig@w3.org>
Subject: Fwd: Re: The murky intersection of accessibility and internationalization

Forgot to reply to the list.

---------- Forwarded message ----------
From: "Andrew Cunningham" <andj.cunningham@gmail.com<mailto:andj.cunningham@gmail.com>>
Date: 10 Jan 2017 11:10 AM
Subject: Re: The murky intersection of accessibility and internationalization
To: <chaals@yandex-team.ru<mailto:chaals@yandex-team.ru>>
Cc:

HI


On 9 Jan 2017 17:10, <chaals@yandex-team.ru<mailto:chaals@yandex-team.ru>> wrote:
Hi Andrew,

I suggest you look at the "understanding 3.1.1" section - https://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-doc-lang-id.html


It says, right at the top,

"The intent of this Success Criterion is to ensure that content developers provide information in the Web page that user agents need to present text and other linguistic content correctly. Both assistive technologies and conventional user agents can render text more accurately when the language of the Web page is identified. Screen readers can load the correct pronunciation rules. Visual browsers can display characters and scripts correctly. Media players can show captions correctly. As a result, users with disabilities will be better able to understand the content."

It isn't a language identification issue, rather it is a character encoding issue. Even if language is correctly tagged, the problem remains.


If the text itself doesn't match the language, then fails to meet the intent - i.e. it is not fit for purpose.

This is more likely a failing in PDF and other file formats where ability to select correct language at the authoring stage is much more limited than in HTML. And would be an argument for why HTML should be used in preference to other rich text file formats.


Likewise, in Understanding 1.1.1 - https://www.w3.org/TR/UNDERSTANDING-WCAG20/text-equiv.html


it says

"The purpose of this guideline is to ensure that all non-text content is also available in text. "Text" refers to electronic text, not an image of text. Electronic text has the unique advantage that it is presentation neutral. That is, it can be rendered visually, auditorily, tactilely, or by any combination. As a result, information rendered in electronic text can be presented in whatever form best meets the needs of the user. It can also be easily enlarged, spoken aloud so that it is easier for people with reading disabilities to understand, or rendered in whatever tactile form best meets the needs of a user."

So anything that is written using a visual trick to replace the underlying characters with other glyphs isn't "text", in the meaning of WCAG, and requires an alternative. The simplest one for the cases you describe would of course be proper unicode text…

This issue is also noted in the glossary definition of "non-text content": https://www.w3.org/TR/UNDERSTANDING-WCAG20/text-equiv-all.html#non-text-contentdef


But I agree that in terms of Success Criteria this isn't immediately obvious. Since justifying the jobs of accessibility consultants as the only people who can understand WCAG isn't a goal, I think it would be good to think about how we could clarify this in WCAG.

Initially I did think about the text and non-text distinction in WCAG 2.0, but initially thought that using this would be too radical. But since you posit it, then it is worth further thought.

I would also argue that this interpretation is obscure enough for many accessibility specialists to stumble on.

The problem is that WCAG 2.0 does not directly address issues relating to character encoding. There are no normative requirements for textual content. For a document to be considered accessible, the character encoding would need to be identified AND supported by the software in use.

So in theory you need to use a subset of encodings likely to be widely implemented for a document to be considered accessible, unless you include a "textual alternative". Essential this comes down to "Use Unicode, or add a Unicode alternative if required".

It also has interesting implications for PDF. If all glyphs in font can not be resolved to Unicode codepoints via ToUnicode mapping then the text layer contains non text content. In such cases ActualText must be added.

Even if Unicode must be used, PDF's, for a wide range of Unicode blocks, cannot resolve the codepoints into the correct sequence, creating malformed Unicode sequences. This is an inherent problem of the format.

So for various languages, PDF files must always contain ActualText attributes.

All the above assuming the definitions of text and non text content in WCAG 2.0

An interesting  aside would be that it is possible to have a file that was accessible, at a later stage fail to be accessible because software no longer supports the character encoding used.

For instance, Web browsers over time have supported fewer encodings, preferring Unicode, but continuing support for key legacy encodings. For instance at one time there were key browser's that supported a numbet Tamil and Vietnamese character encodings. Web pages of that Vintage that met WCAG 2.0 requiremen's could be considered accessible. The same document with modern browsers would have to be considered inaccessible. Has interesting issues for archiving.


aside …
I worked on an example last century, where a group of aboriginal languages were written using a font so that various punctuation characters would be visually represented as the right glyph - but since the underlying word would have punctuation marks in place of some letters, they could not be presented by a screen reader or represented accurately in a font designed for e.g. simplifying reading for people with dyslexia. If I recall correctly, an added problem was not having language code.


I remember discussing this with you, way back in the past at mtings at RMIT, if I remember correctly.

Andrew
Received on Tuesday, 10 January 2017 00:54:08 UTC