W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > January to March 2017

Fwd: Re: The murky intersection of accessibility and internationalization

From: Andrew Cunningham <andj.cunningham@gmail.com>
Date: Tue, 10 Jan 2017 11:39:38 +1100
Message-ID: <CAOUP6KngdTEUcC4r45-=bUs12mSjMJyDf+U=uY=Z+YMf20OxEQ@mail.gmail.com>
To: WAI Interest Group <w3c-wai-ig@w3.org>
Forgot to reply to the list.

---------- Forwarded message ----------
From: "Andrew Cunningham" <andj.cunningham@gmail.com>
Date: 10 Jan 2017 11:10 AM
Subject: Re: The murky intersection of accessibility and
To: <chaals@yandex-team.ru>


On 9 Jan 2017 17:10, <chaals@yandex-team.ru> wrote:

> Hi Andrew,
> I suggest you look at the "understanding 3.1.1" section -
> https://www.w3.org/TR/UNDERSTANDING-WCAG20/meaning-doc-lang-id.html
> It says, right at the top,
> "The intent of this Success Criterion is to ensure that content developers
> provide information in the Web page that user agents need to present text
> and other linguistic content correctly. Both assistive technologies and
> conventional user agents can render text more accurately when the language
> of the Web page is identified. Screen readers can load the correct
> pronunciation rules. Visual browsers can display characters and scripts
> correctly. Media players can show captions correctly. As a result, users
> with disabilities will be better able to understand the content."

It isn't a language identification issue, rather it is a character encoding
issue. Even if language is correctly tagged, the problem remains.

> If the text itself doesn't match the language, then fails to meet the
> intent - i.e. it is not fit for purpose.
This is more likely a failing in PDF and other file formats where ability
to select correct language at the authoring stage is much more limited than
in HTML. And would be an argument for why HTML should be used in preference
to other rich text file formats.

> Likewise, in Understanding 1.1.1 - https://www.w3.org/TR/UNDERSTA
> NDING-WCAG20/text-equiv.html
> it says
> "The purpose of this guideline is to ensure that all non-text content is
> also available in text. "Text" refers to electronic text, not an image of
> text. Electronic text has the unique advantage that it is presentation
> neutral. That is, it can be rendered visually, auditorily, tactilely, or by
> any combination. As a result, information rendered in electronic text can
> be presented in whatever form best meets the needs of the user. It can also
> be easily enlarged, spoken aloud so that it is easier for people with
> reading disabilities to understand, or rendered in whatever tactile form
> best meets the needs of a user."
> So anything that is written using a visual trick to replace the underlying
> characters with other glyphs isn't "text", in the meaning of WCAG, and
> requires an alternative. The simplest one for the cases you describe would
> of course be proper unicode text…
> This issue is also noted in the glossary definition of "non-text content":
> https://www.w3.org/TR/UNDERSTANDING-WCAG20/text-equiv-all.ht
> ml#non-text-contentdef
> But I agree that in terms of Success Criteria this isn't immediately
> obvious. Since justifying the jobs of accessibility consultants as the only
> people who can understand WCAG isn't a goal, I think it would be good to
> think about how we could clarify this in WCAG.

Initially I did think about the text and non-text distinction in WCAG 2.0,
but initially thought that using this would be too radical. But since you
posit it, then it is worth further thought.

I would also argue that this interpretation is obscure enough for many
accessibility specialists to stumble on.

The problem is that WCAG 2.0 does not directly address issues relating to
character encoding. There are no normative requirements for textual
content. For a document to be considered accessible, the character encoding
would need to be identified AND supported by the software in use.

So in theory you need to use a subset of encodings likely to be widely
implemented for a document to be considered accessible, unless you include
a "textual alternative". Essential this comes down to "Use Unicode, or add
a Unicode alternative if required".

It also has interesting implications for PDF. If all glyphs in font can not
be resolved to Unicode codepoints via ToUnicode mapping then the text layer
contains non text content. In such cases ActualText must be added.

Even if Unicode must be used, PDF's, for a wide range of Unicode blocks,
cannot resolve the codepoints into the correct sequence, creating malformed
Unicode sequences. This is an inherent problem of the format.

So for various languages, PDF files must always contain ActualText

All the above assuming the definitions of text and non text content in WCAG

An interesting  aside would be that it is possible to have a file that was
accessible, at a later stage fail to be accessible because software no
longer supports the character encoding used.

For instance, Web browsers over time have supported fewer encodings,
preferring Unicode, but continuing support for key legacy encodings. For
instance at one time there were key browser's that supported a numbet Tamil
and Vietnamese character encodings. Web pages of that Vintage that met WCAG
2.0 requiremen's could be considered accessible. The same document with
modern browsers would have to be considered inaccessible. Has interesting
issues for archiving.

> aside …
> I worked on an example last century, where a group of aboriginal languages
> were written using a font so that various punctuation characters would be
> visually represented as the right glyph - but since the underlying word
> would have punctuation marks in place of some letters, they could not be
> presented by a screen reader or represented accurately in a font designed
> for e.g. simplifying reading for people with dyslexia. If I recall
> correctly, an added problem was not having language code.
I remember discussing this with you, way back in the past at mtings at
RMIT, if I remember correctly.

Received on Tuesday, 10 January 2017 00:40:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 10 January 2017 00:40:14 UTC