[XHR] Internationalization comments

Hi all,

Following are my initial comments having read the spec at 
http://www.w3.org/TR/2007/WD-XMLHttpRequest-20071026/

Any comments on these?

--
1. (observation) The introduction is rather stiffly written and could be 
brushed up for accessibility.

2. The definition of "Conforming XML user agent" is not written in the 
same normative style as that of "conforming user agent" (which appears 
just above it in section 1.2). In particular, it should use normative 
RFC 2119 language.

3. Section 1.2.1 requires that conforming user agents support some 
subset of DOM Events, DOM Core, and Window Object, but doesn't specify 
what that subset is. This makes the normative language questionable?

4. Section 1.2.2 defines case-insensitive matching as follows:

--
There is a case-insensitive match  of strings s1 and s2 if after 
uppercasing both strings (by mapping a-z to A-Z) they are identical.
--

This has two problems.

   a. First, it only specifies case-insensitive matching for the ASCII 
letter set, ignoring the remainder of Unicode.

   b. Second, it doesn't make clear that this is the default mapping. 
There are languages (Turkish, for example) in which the default mapping 
doesn't apply and this potentially causes problems for matching: when 
case-mapping is instantiated in these locales, by default they do the 
"wrong thing". I do note that this definition is only used in the 
document in the context of HTTP header *names* (which are restricted to 
ASCII). However, if it ever were applied to HTTP header bodies (which 
can contain encoded non-ASCII strings), it might encompass a larger set. 
In any case, I would propose that this be changed as suggested below, 
since some programmers forget about SpecialCasing rules in their default 
case-mappings:

--
There is a case-insensitive match of strings s1 and s2 if they compare 
identically using the default case foldings defined by Unicode. Note 
that these do not include language-specific mappings, such as the 
dotted/dotless 'i' mappings in Turkish or Azerbaijani (see Unicode 
SpecialCasing).
--

5. In Section 2, where charset detection is described, step 6 should 
mention that these byte sequences correspond to the BOM (Byte Order 
Mark) in Unicode.

6. Also in Section 2, Step 8 says:

--
Return the result of decoding the response entity body using charset. 
Or, if that fails, return null.
--

It is possible that a buffer has been assigned the default encoding, 
which is UTF-8. Any non-UTF-8 buffer will most likely fail conversion 
using UTF-8, since UTF-8 is highly patterned. I assume that this means 
that the response entity body would be 'null' in that case?

7. Before the section on charset detection in Section 2, there should be 
a health warning stating something like:

--
For interoperability, the use of a Unicode encoding, particularly UTF-8, 
is RECOMMENDED. Non-Unicode encodings are difficult to detect and 
effectively limit the range of character data that can be transmitted 
reliably.
--

8. Section 2 really should be divided into subsections. The current 
organization is kind of a mish-mosh. Among other things, it is difficult 
to reference in its current state.

9. The section on setRequestHeader says:

--
If the header argument is in the list of request headers either use 
multiple headers, combine the values or use a combination of those
--

It then gives and example in which the headers are combined 
algorithmically. However, some headers, such as Accept-Language, use q 
weights and other structure and this approach may not work acceptably in 
those cases. Perhaps provide some guidance on these cases?

10. In the send() method, if 'data' is a DOMString, it is always encoded 
as UTF-8 (good). But this seems at odds with the ability to have 
different encodings in the headers, etc. Really the currently specified 
behavior is the behavior we want. Perhaps it should explicitly state 
that string data is always sent as UTF-8?

11. At the end of the section on the send method, this para appears:

--
If the user agent implements server-driven content-negotiation it should 
set Accept-Language, Accept-Encoding and Accept-Charset headers as 
appropriate; it must not automatically set the Accept header. Responses 
to such requests must have the content-encodings automatically decoded. 
[RFC2616]
--

Several comments on this:

  a. The normative word "must" appears in a non-normative form. This 
should either be corrected or expanded upon.

  b. The encoding/charset headers certainly relate to the handling of 
the content. But the Accept-Language header is different. It controls 
the language negotiation process.

  c. The Accept-Language header is currently the only mechanism provided 
for XHR locale management. Since ECMAScript in particular has no locale 
management capabilities or locale facet, it may be important to convey a 
language or locale to the server (where such functionality resides) in 
certain interactions. Thus, it would be useful to separately mention the 
Accept-Language header and its use in informing the server of 
language/locale preference. In addition, we would probably recommend the 
use of the BCP 47 (in particular RFC 4647) Lookup algorithm for matching 
the A-L header here.

12. General internationalization note: there is barely mention of 
language or locale negotiation or considerations in this document. This 
is probably appropriate given the scope of this document, focused 
strictly on the XmlHttpRequest object. However, it should be noted that 
lack of these capabilities will require non-interoperable custom 
implementations. Standardization of language/locale negotiation for AJAX 
and REST type interactions (that rely on XHR) should be described 
somewhere. This may represent a work item for the Internationalization 
Core WG.


--

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.

Received on Wednesday, 14 November 2007 18:49:07 UTC