Raw notes from Martin Duerst / Ian Jacobs UAAG/I18N discussion

Hello,

Martin and I just reviewed (while he visits my home)
his internationalization (I18N) comments on the last call 
draft of the User Agent Accessibility Guidelines 
(UAAG) 1.0:
  http://www.w3.org/TR/2000/WD-UAAG10-20001023/

These are the raw notes from that discussion. 
The points raised by Martin here should be considered
last call comments. In these notes, we have not yet 
labeled issues to be formally addressed by the UA 
Working Group. 

People are encouraged to comment on these notes,
in particular to add either I18N or accessibility
perspectives. In particular, Max Nakane is invited
to provide further information about these issues
in the context of Japanese.

Thank you,

 - Ian

=============================================
Charset information
=============================================
IJ: If it affects everyone equally, then we have a convention
    not to include such requirements in such a document.

MD: In a US context, this isn't much of a problem. It's helpful
to point out to user agent developers, however, that this is
something that they should do.

MD: Here's why there's an impact on ATs:
 - The DOM representation is always UTF-16.
 - Therefore, the UA that constructs the DOM must do the
   right thing in order for the AT to have access to characters,
   not just bytes.

IJ: But if it's a DOM conformance issue, then we distribute that
to the DOM by virtue of requiring conformance to it.
/* On APIs */

MD: Each API probably has its own specific character encoding.
For instance, in Japan, there are three widely-used encodings.
When the UA passes information through an API, it must ensure
that the information is converted per the encoding rules of that
API. [The DOM is a special case of this.]

IJ: I see, so even if the DOM case is covered by
the conformance requirement, we do make more general API
requirements where we would not be "covered".

Actions:
  - Talk to Masafumi about what happens when a UA
    doesn't get the proper character encoding information.

Possible solutions:

 - Add a checkpoint
 - Add examples / explanation in a Note after checkpoint
   (Guidelines 1, 5?)
 - Add to definition of API
 - Add to techniques
 - In G5 prose, to "using standard communication channels for this
   exchange", add information about using the appropriate character
   encoding for these channels.

=============================================
4.12 Speech rate
=============================================

MD: Please do not specify the upper and lower bounds in 
    terms of English language!

MD: Plus, the concept of "word" is hard to define for, say,
    Japanese. (It may be easier to talk about syllables...)

IJ Proposes: 
   - Make the range an example for English.
   - Suggest that upper and lower bounds vary according 
     according to language.

=============================================
Checkpoint 2.7
=============================================

MD: It's unclear what the boundaries of 
"recognized but unsupported" are:

 1) Doesn't understand lang="fr"
 2) Doesn't understand "fr" (or something like "qx", or
    a string that isn't bound to a language code). The set
    of language codes grows!
 3) Understands "fr" but doesn't have a French dictionary
    (for speech).
  
MD: If you know what "fr" means but don't have resources
to handle it (e.g., a dictionary), that's one thing (e.g.,
you can ask the user if using the italian dictionary you
do have is ok). If the user agent doesn't know how to
translate the code (e.g., "zh") to a language name that
most users will be familiar with (and most users will not
be familiar with language codes), then the UA won't help
a user decide whether the information (e.g., in Chinese)
is useful at all.

IJ: Note that in case where UA hands off rendering to
a speech synthesizer, then "recognize" could just mean 
"pass it off to some appropriate sp. synthesizer".

MD: Another issue: if the purpose of 2.7 is to avoid confusion, then
there's little difference between perfectly rendered Chinese to
someone who doesn't understand Chinese, and garbage sounds or
glyphs. "It's all Greek to me."

IJ: While there may not be a difference, I'm more confident
saying that we are pretty sure that doing something one way
will cause confusion, but I can't guarantee that in another
case doing something right won't cause confusion. 

MD: This actually points back to the charset issue again
(to ensure that you don't get garbage).

=============================================
"Natural language"
=============================================

MD: For text-to-speech and text-to-braille, it's important that you
know what language it is. For graphical rendering of text, the
language of the text is only of minor importance. What counts
in the latter case is the "script".

IJ: We've avoided the term "script" in this context to
avoid confusion with "scripting languages". 

MD: So 2.7 needs adjusting because you want to talk about speech
(natural language) as well as graphical rendering (script).

MD: "case" (e.g., uppercase) is a special case. If you translate
to braille, you lose the distinction between katakana and
hiragana. 

Action MD: 
 - Verify this with Max Nakane.

MD: For 7.5, I think that "script" is a more appropriate term.

Action:
 - Ask Glen Adams what to use.

IJ: We could put two entries in for "script". But I need a
good definition of script.

MD: Check out Unicode.

=============================================
Configurations based on natural language
=============================================

MD: Should say that the user agent should allow
configuration for script/natural language. Suppose
you have a speech engine for fr and one for en.
The user will want to specify different speech
rates and thus needs sufficient configuration.

MD: For braille text, a user may want to have 
shortened braille for English and long braille
for German. 

MD: For graphical display, you can imagine that
people with visual disabilities will be more apt
at guessing English at smaller sizes..

MD: For Japanese, for example, many people can
figure out the text at 9-10 pixels. But this is
inadequate to represent the characters. Visually
impaired users can probably, with a language 
they are familiar with, can put up with more fuzziness.

IJ: How do you prevent combinatorial explosions
of configurations (i.e., for every setting, N languages)?

MD: If the language information is provided in the source
document, and if you use CSS2, such configurations are easy
to do.

IJ: I would limit these requirements to text information.
For instance, I don't think it's important to be able
to say "I want the background to be blue when I'm viewing
a French document".

MD: The most important info:
 - Text size
 - Speech rate.
 - [For braille, contracted or expanded style.]

=============================================
Language information and heuristics
=============================================

MD: In most documents, despite WCAG 1.0, language information is not
explicit. 

IJ: This is a repair issue. Our document doesn't have many
repair requirements.

MD: You may hand off content to a speech synthesizer saying that it's
English, and the speech synthesizer may give it back and say "no, I
don't think this is English". This works for graphical rendering on a
script level, but doesn't work on a language level because the
graphical rendering doesn't have a dictionary. If the user knows
French and English and has a French and English speech synthesizer
installed. If the document says lang="fr", the UA will pick the French
speech synthesizer. Most documents will not say whether they are
French or English. One speech synthesizer will do a much better job of
rendering. It would be convenient (and probably not very difficult) to
switch sp. synthesizers automatically, without prompting the user.

IJ: I think that this is a convenience functionality: if the user can
select dictionaries manually, then they have access. So clearly not
impossible (thus not P1).

IJ formulating MD's requirement. Three cases:

 - Author has specified natural language correctly.
 - Author has specified natural language incorrectly.
 - Author has not specified natural language.

In latter two cases, user must be able to select from
among available dictionaries (e.g., to override choices
by the author or user agent).

MD: In most cases, it's not necessary to specify natural
language information. 

MD:
 - Authors should identify the natural language of content.
 - User agents must allow users to select natural language
   (e.g., dictionary, different engine0.
 - User agents may do automatic natural language selection 
   when authors haven't specified.

=============================================
4.2 Font family
=============================================

MD: Suppose you want to say "Use Helvetica". This is problematic for
multilingual documents. I think you have to be careful with the
wording. 

MD: Both NN and IE do the right thing. You don't want 4.2
to mean "Choose Helvetica, and if Helvetica doesn't work give up."

MD: You should be able to say "Sans serif" and get something
reasonable.

MD: "the font family of *all* text". You need to qualify "all".
Not everything *has* to be in helvetica; only really what
the user expects to be in helvetica (e.g., English text).

Proposed:

 - Clarify that the UA can choose another font family if
   it can't find a way to represent a particular character.
 - Clarify that the user must be able to select a preferred font
   family but that some content in a given document may not be rendered
   using that font family.

=============================================
7.5 Search
=============================================

MD: In English, you assume phonetic spelling. You assume
that people search by inputting the spelling. People who listen to
text-to-speech may not know the spelling. People who are used to
contracted braille may not know the extended version. People in Japan
might know the pronunciation of text, but not know which Kanji
to use. If you use the "plain text" based search interface, this might
not allow some people to find some information. I think this is very
much a disability issue.

MD: You might want to search on a transformed version.

IJ: Our limits are:
 - Search for text characters in source
 - Search only on those that have been rendered (or
   will be, through a viewport)

MD: In Japanese, text is rendered graphically as Kanji.
Kanji can be read in different ways, and different ones the
same way. If someone knows shorthand braille, they want to search
using the shorthand.

IJ: Our checkopint doesn't talk about input methods - we
only talk about the search functionality. How the user gets
shorthand braille into the machine is not covered by 7.5.

IJ: I think the issue you raise is the same for other
input situations such as form controls. I also think that
this issue may be larger than for users with disabilities:
e.g. how Japanese users input Kanji.

=============================================
9.2 Input configuration
=============================================

MD: For Japanese and Chinese, input happens in two stages: the keys
you press, then there's the "IME" (input method editor). With some
user interaction, this combination produces the final text. 

IJ: From 1.2:

 "When available, developers should use APIs at a higher level of
  abstraction than the standard device APIs for the operating system."

MD: 

 - Add a technique to 1.2 about higher level APIs (after IME)
   for character input.

IJ: What about single-key bindings (checkpoint 9.5) in Japanese?

MD: Similar situation to form controls: some keys will not
be available since they have a particular function.

IJ: We cover different "modes" in 9.5.

MD: Alt-Shift may be used for switching input methods, for example.

IJ: That's covered by checkpoint 9.2

MD: There are arguments for deleting "accessibility"
(and leaving "conventions") in 9.2. For example, Alt-tab to switch
viewports is not clearly for accessibility, but should not be 
interfered with.

=============================================
Conformance
=============================================

MD: "Must implement the standard API for the keyboard".
MD: There may be two APIs :
 - Raw keys
 - Final text after IME.

MD: We had DOM/I18N brainstorming on this issue. Minutes
available:
   http://www.w3.org/DOM/Group/meetings/m19991201.html




-- 
Ian Jacobs (jacobs@w3.org)   http://www.w3.org/People/Jacobs
Tel:                         +1 831 457-2842
Cell:                        +1 917 450-8783

Received on Saturday, 11 November 2000 22:38:00 UTC