(Action) Issue 327: Proposed checkpoint for character encoding support in APIs from Ian Jacobs on 2001-01-15 (w3c-wai-ua@w3.org from January to March 2001)

From: Ian Jacobs <ij@w3.org>
Date: Mon, 15 Jan 2001 12:48:03 -0500
To: w3c-wai-ua@w3.org
Message-ID: <3A6337D3.9C9726FB@w3.org>

Hello,

Per my action item from the 16 November 2000 face-to-face at
AOL [1], please consider the following new Priority 1
checkpoint to address issue 327 [2]:

<NEW>
For an API implemented to satisfy requirements of this document,
support the character encodings required for that API. 

 Note: Support for character encodings is important so that
 text is not "broken" when communicated to assistive 
 technologies. The DOM Level 2 Core Specification
 [DOM2CORE], section 1.1.5 requires that the DOMString type
 be encoded using UTF-16. 
</NEW>

----------------------------
For the Techniques document:
----------------------------

1) For Java v 1.3, there is a list of encodings [JAVA13] that any
   conforming implementation must support is: US-ASCII,
   ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16.

2) MSAA relies on COM, which relies on Unicode,
   so support UTF-16 in practice. From COM documentation:

 "Finally, and quite significantly, all strings passed through
  all COM interfaces (and, at least on Microsoft platforms, all
  COM APIs) are Unicode strings. There simply is no other
  reasonable way to get interoperable objects in the face of
  (i) location transparency, and (ii) a high-efficiency object
  architecture that doesn't in all cases intervene
  system-provided code between client and server. Further, this
  burden is in practice not large."

  [From Chapter 3 on Interfaces: "Interface Binary Standard"
http://msdn.microsoft.com/library/toc.asp?PaneName=Contents&tocPath=specs4-3-0&ShowPane=true#sel

------------------- 
For the references:
-------------------

 DOM2CORE, section 1.1.5
   http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578
 List of registered charset ids:
   ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
 Character Model for the World Wide Web 
   http://www.w3.org/TR/charmod
 Unicode glossary
   http://www.unicode.org/glossary/
 [JAVA13] Java 1.3 documentation
  
http://java.sun.com/j2se/1.3/docs/api/java/lang/package-summary.html#charenc

-----------------
For the glossary:
-----------------

  A "character encoding" is a mapping from a character set
  definition to the actual code units used to represent the
  data. Please refer to the Unicode 3.0 standard [UNICODE]
  for more information.

 - Ian


[1] http://www.w3.org/WAI/UA/2000/11/minutes-20001116
[2] http://server.rehab.uiuc.edu/ua-issues/issues-linear-lc2.html#327
-- 
Ian Jacobs (jacobs@w3.org)   http://www.w3.org/People/Jacobs
Tel:                         +1 831 457-2842
Cell:                        +1 917 450-8783

Received on Monday, 15 January 2001 12:48:06 UTC