- From: Martin Duerst <duerst@w3.org>
- Date: Sat, 13 Jul 2002 21:10:33 +0900
- To: xmlp-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Protocol WG, please receive the following i18n-related comments on your last call versions of soap 1.2 parts 0-2. Please note that these comments are currently not approved by the I18N WG, but that they will most probably be discussed, and modified and/or approved, at the WG's next teleconference next Tuesday. I'm sending these comments already to give you as much time as possible to reflect them in your specs. Please copy the I18N IG on any discussion regarding these comments. I18N IG: Please note that discussions in the XML Protocol WG are by default public. General: When printing on A4 paper, many of the examples get cut off on the right. Examples should be reedited so that they are somewhat less wide and can be printed on paper around the world without loss. Part 0: - The examples should be changed to be more international. People travel all around the world, to places that have names with characters outside US-ASCII,... Web Services can easily take care of this, and this should be shown. (please ask your chair, who knows how to do this from the XML Schema primer :-). - Example 6: The use of xml:lang="en-US" is very good. A comment saying why xml:lang is important would be even better. - Example 8b: <x:date>12-14-01</x:date>: This is not interoperable! Please use either XML Schema dates (<x:date>2001-12-14</x:date>), because this is machine-to-machine communication, or something like <x:date xml:lang='en-US'>December 14, 2001</x:date> if this is intended for human viewers. - Example 11: charset="utf-8": It would be a good chance to shortly explain the rules for the charset parameter with application/soap+xml (because otherwise, the reader has to follow two references). The best recommendation is probably: Don't use a 'charset' parameter on 'Content-Type', because then the rules for freestanding XML (UTF-8 and UTF-16 (the later always with BOM) as defauts, otherwise <?xml ... encoding='foo'...) apply. - 4.2: "A binding, if using XML 1.0... MAY mandate that a particular character encoding or set of encodings be used.": This is good, but should be changed to say that in such a case, UTF-8/UTF-16 should be choosen (in accordance with XML 1.0 and the Character Model). - 5., last paragraph: This is written as if all white space is by default ignored. But it is probably meant to apply only to insignificant whitespace (e.g. between elements in element content). - 5.4.2: <reason>: xml:lang is optimal, but there should be a note saying that it is strongly recommended. - 5.4.2: <reason>: xml:lang is said to have a namespace name of "http://www.w3.org/XML/1998/namespace". This alone does not guarantee that the prefix will be 'xml' in XML 1.0 serialization, because the Infoset spec doesn't say so (or at least I didn't find something to that effect). This has to be nailed down here to avoid serializations such as <reason xmlns:foo='http://www.w3.org/XML/1998/namespace' foo:lang='... - 5.4.2: <reason> is a human-readable string, but there is no way for the request side to indicate which language would be preferred. This is a serious problem. Solutions may include the definition of a soap feature (preferably a module) for this, or requirements/ recommendations for bindings to make mechanisms they have available (e.g. Accept-Language for the HTTP binding). - 5.4.2: In some cases, it can make sense to send <reason> in more than one language. Is this allowed? It may be a good idea. - <reason> is currently the only place where human readable text is used. But despite Web Services being primarily machine-to-machine, we expect that quite some applications will include data that is ultimately targeted at humans, or will have to make some part of their processing dependent on human language and culture. This seems to indicate that some more work will have to be done. - 6. This section should make it clear that soap uses the XML Schema type 'anyURI', and that therefore characters outside US-ASCII are allowed, but have to be mapped to URIs via UTF-8 (i.e. SOAP essentially uses IRIs, refer to XML Schema or XLink for conversion details) if e.g. the underlying protocol doesn't support IRIs. There also should be a requirement to deal with this in the binding framework, and an explanation of how to deal with this in the HTTP binding, as well as some tests (I can help with the tests). Also, a note in the Primer would help. - 7.3: feature conflict between soap features and features in the underlying protocol: This is a general issue, not only for security. It should be mentioned in the chapter on bindings. Part 2: - 2: How is XML mixed content represented in this graph model? Can it be represented? If not, this is a serious problem for internationalization. - 3.1.2: encoding simple values: There should be a note mentioning that most characters in the C0 range cannot be represented in XML. - 5.1.1: Properties are restricted to simple datatypes. This may cause serious problems for internationalization. - 7.5.1.3: response MAY be of content type other than application/soap+xml: add a note saying that care is needed because different content types may have different rules for the 'charset' parameter. - Appendix A: This needs a major overhaul (Masahiro Sekiguchi already pointed out some problems quite a while ago). - Start with some introductory text explaining what's going on. - XML Name has two parts -> An XML Name ... - Let Prefix be computed: There is really no computation going on at all. - In order from left to right -> In order from first to last (otherwise, you get problems with bidirectionality) [but this will drop out anyway] - 2: change to: Let TAG be a name in an application, represented as a sequence of characters encoded in a particular character encoding. - 3: change to: Let UNI be the sequence of characters of TAG transcoded to Unicode with a normalizing transcoder (using NFC), and let M<sub>1</sub>, M<sub>2</sub>, ... , M<sub>N</sub> be the characters of UNI, in order from first to last. - Add a note: The number of characters in TAG is not necessarily the same as the number of characters in UNI, because transcoding may be one-to-many or many-to-one. The details of transcoding may be implementation-defined. There may be (very rarely) cases where there is no equivalent Unicode representation for TAG; such cases are not covered here. - remove 4. - Change all T<sub>foo</sub> to M<sub>foo</sub> in the rest. - Remove 5.1, moving up 5.2,... - Say explicitly that hex digits always use upper case letters. - Add examples with non-ASCII characters, both in the BMP (not only Latin-1) and outside the BMP. Are we supposed to review the test cases, too? Regards, Martin.
Received on Saturday, 13 July 2002 08:11:12 UTC