- From: Addison Phillips <addison@yahoo-inc.com>
- Date: Wed, 14 Nov 2007 10:48:40 -0800
- To: public-i18n-core@w3.org
Hi all, Following are my initial comments having read the spec at http://www.w3.org/TR/2007/WD-XMLHttpRequest-20071026/ Any comments on these? -- 1. (observation) The introduction is rather stiffly written and could be brushed up for accessibility. 2. The definition of "Conforming XML user agent" is not written in the same normative style as that of "conforming user agent" (which appears just above it in section 1.2). In particular, it should use normative RFC 2119 language. 3. Section 1.2.1 requires that conforming user agents support some subset of DOM Events, DOM Core, and Window Object, but doesn't specify what that subset is. This makes the normative language questionable? 4. Section 1.2.2 defines case-insensitive matching as follows: -- There is a case-insensitive match of strings s1 and s2 if after uppercasing both strings (by mapping a-z to A-Z) they are identical. -- This has two problems. a. First, it only specifies case-insensitive matching for the ASCII letter set, ignoring the remainder of Unicode. b. Second, it doesn't make clear that this is the default mapping. There are languages (Turkish, for example) in which the default mapping doesn't apply and this potentially causes problems for matching: when case-mapping is instantiated in these locales, by default they do the "wrong thing". I do note that this definition is only used in the document in the context of HTTP header *names* (which are restricted to ASCII). However, if it ever were applied to HTTP header bodies (which can contain encoded non-ASCII strings), it might encompass a larger set. In any case, I would propose that this be changed as suggested below, since some programmers forget about SpecialCasing rules in their default case-mappings: -- There is a case-insensitive match of strings s1 and s2 if they compare identically using the default case foldings defined by Unicode. Note that these do not include language-specific mappings, such as the dotted/dotless 'i' mappings in Turkish or Azerbaijani (see Unicode SpecialCasing). -- 5. In Section 2, where charset detection is described, step 6 should mention that these byte sequences correspond to the BOM (Byte Order Mark) in Unicode. 6. Also in Section 2, Step 8 says: -- Return the result of decoding the response entity body using charset. Or, if that fails, return null. -- It is possible that a buffer has been assigned the default encoding, which is UTF-8. Any non-UTF-8 buffer will most likely fail conversion using UTF-8, since UTF-8 is highly patterned. I assume that this means that the response entity body would be 'null' in that case? 7. Before the section on charset detection in Section 2, there should be a health warning stating something like: -- For interoperability, the use of a Unicode encoding, particularly UTF-8, is RECOMMENDED. Non-Unicode encodings are difficult to detect and effectively limit the range of character data that can be transmitted reliably. -- 8. Section 2 really should be divided into subsections. The current organization is kind of a mish-mosh. Among other things, it is difficult to reference in its current state. 9. The section on setRequestHeader says: -- If the header argument is in the list of request headers either use multiple headers, combine the values or use a combination of those -- It then gives and example in which the headers are combined algorithmically. However, some headers, such as Accept-Language, use q weights and other structure and this approach may not work acceptably in those cases. Perhaps provide some guidance on these cases? 10. In the send() method, if 'data' is a DOMString, it is always encoded as UTF-8 (good). But this seems at odds with the ability to have different encodings in the headers, etc. Really the currently specified behavior is the behavior we want. Perhaps it should explicitly state that string data is always sent as UTF-8? 11. At the end of the section on the send method, this para appears: -- If the user agent implements server-driven content-negotiation it should set Accept-Language, Accept-Encoding and Accept-Charset headers as appropriate; it must not automatically set the Accept header. Responses to such requests must have the content-encodings automatically decoded. [RFC2616] -- Several comments on this: a. The normative word "must" appears in a non-normative form. This should either be corrected or expanded upon. b. The encoding/charset headers certainly relate to the handling of the content. But the Accept-Language header is different. It controls the language negotiation process. c. The Accept-Language header is currently the only mechanism provided for XHR locale management. Since ECMAScript in particular has no locale management capabilities or locale facet, it may be important to convey a language or locale to the server (where such functionality resides) in certain interactions. Thus, it would be useful to separately mention the Accept-Language header and its use in informing the server of language/locale preference. In addition, we would probably recommend the use of the BCP 47 (in particular RFC 4647) Lookup algorithm for matching the A-L header here. 12. General internationalization note: there is barely mention of language or locale negotiation or considerations in this document. This is probably appropriate given the scope of this document, focused strictly on the XmlHttpRequest object. However, it should be noted that lack of these capabilities will require non-interoperable custom implementations. Standardization of language/locale negotiation for AJAX and REST type interactions (that rely on XHR) should be described somewhere. This may represent a work item for the Internationalization Core WG. -- Addison -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG Internationalization is an architecture. It is not a feature.
Received on Wednesday, 14 November 2007 18:49:07 UTC