- From: Martin Duerst <duerst@w3.org>
- Date: Sat, 13 Jul 2002 21:10:33 +0900
- To: xmlp-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Protocol WG,
please receive the following i18n-related comments on your
last call versions of soap 1.2 parts 0-2. Please note that
these comments are currently not approved by the I18N WG,
but that they will most probably be discussed, and modified
and/or approved, at the WG's next teleconference next Tuesday.
I'm sending these comments already to give you as much time
as possible to reflect them in your specs.
Please copy the I18N IG on any discussion regarding these
comments.
I18N IG: Please note that discussions in the XML
Protocol WG are by default public.
General:
When printing on A4 paper, many of the examples get cut off
on the right. Examples should be reedited so that they are
somewhat less wide and can be printed on paper around the
world without loss.
Part 0:
- The examples should be changed to be more international.
People travel all around the world, to places that have
names with characters outside US-ASCII,... Web Services
can easily take care of this, and this should be shown.
(please ask your chair, who knows how to do this from
the XML Schema primer :-).
- Example 6: The use of xml:lang="en-US" is very good. A comment
saying why xml:lang is important would be even better.
- Example 8b: <x:date>12-14-01</x:date>: This is not interoperable!
Please use either XML Schema dates (<x:date>2001-12-14</x:date>),
because this is machine-to-machine communication, or something
like <x:date xml:lang='en-US'>December 14, 2001</x:date> if this
is intended for human viewers.
- Example 11: charset="utf-8": It would be a good chance to shortly
explain the rules for the charset parameter with application/soap+xml
(because otherwise, the reader has to follow two references).
The best recommendation is probably: Don't use a 'charset' parameter
on 'Content-Type', because then the rules for freestanding
XML (UTF-8 and UTF-16 (the later always with BOM) as defauts,
otherwise <?xml ... encoding='foo'...) apply.
- 4.2: "A binding, if using XML 1.0... MAY mandate that a particular
character encoding or set of encodings be used.": This is good,
but should be changed to say that in such a case, UTF-8/UTF-16
should be choosen (in accordance with XML 1.0 and the Character
Model).
- 5., last paragraph: This is written as if all white space is
by default ignored. But it is probably meant to apply only
to insignificant whitespace (e.g. between elements in element content).
- 5.4.2: <reason>: xml:lang is optimal, but there should be a note
saying that it is strongly recommended.
- 5.4.2: <reason>: xml:lang is said to have a namespace name of
"http://www.w3.org/XML/1998/namespace". This alone does not
guarantee that the prefix will be 'xml' in XML 1.0 serialization,
because the Infoset spec doesn't say so (or at least I didn't
find something to that effect). This has to be nailed
down here to avoid serializations such as
<reason xmlns:foo='http://www.w3.org/XML/1998/namespace' foo:lang='...
- 5.4.2: <reason> is a human-readable string, but there is no way
for the request side to indicate which language would be preferred.
This is a serious problem. Solutions may include the definition
of a soap feature (preferably a module) for this, or requirements/
recommendations for bindings to make mechanisms they have available
(e.g. Accept-Language for the HTTP binding).
- 5.4.2: In some cases, it can make sense to send <reason> in
more than one language. Is this allowed? It may be a good idea.
- <reason> is currently the only place where human readable text is
used. But despite Web Services being primarily machine-to-machine,
we expect that quite some applications will include data that is
ultimately targeted at humans, or will have to make some part
of their processing dependent on human language and culture.
This seems to indicate that some more work will have to be done.
- 6. This section should make it clear that soap uses the XML Schema
type 'anyURI', and that therefore characters outside US-ASCII are
allowed, but have to be mapped to URIs via UTF-8 (i.e. SOAP
essentially uses IRIs, refer to XML Schema or XLink for conversion
details) if e.g. the underlying protocol doesn't
support IRIs. There also should be a requirement to deal with this
in the binding framework, and an explanation of how to deal with
this in the HTTP binding, as well as some tests (I can help
with the tests). Also, a note in the Primer would help.
- 7.3: feature conflict between soap features and features in the
underlying protocol: This is a general issue, not only for
security. It should be mentioned in the chapter on bindings.
Part 2:
- 2: How is XML mixed content represented in this graph model?
Can it be represented? If not, this is a serious problem for
internationalization.
- 3.1.2: encoding simple values:
There should be a note mentioning that most characters in the
C0 range cannot be represented in XML.
- 5.1.1: Properties are restricted to simple datatypes. This may
cause serious problems for internationalization.
- 7.5.1.3: response MAY be of content type other than application/soap+xml:
add a note saying that care is needed because different
content types may have different rules for the 'charset' parameter.
- Appendix A: This needs a major overhaul (Masahiro Sekiguchi already
pointed out some problems quite a while ago).
- Start with some introductory text explaining what's going on.
- XML Name has two parts -> An XML Name ...
- Let Prefix be computed: There is really no computation going on at all.
- In order from left to right -> In order from first to last
(otherwise, you get problems with bidirectionality)
[but this will drop out anyway]
- 2: change to: Let TAG be a name in an application, represented
as a sequence of characters encoded in a particular character encoding.
- 3: change to: Let UNI be the sequence of characters of TAG
transcoded to Unicode with a normalizing transcoder (using NFC),
and let M<sub>1</sub>, M<sub>2</sub>, ... , M<sub>N</sub> be the
characters of UNI, in order from first to last.
- Add a note: The number of characters in TAG is not necessarily
the same as the number of characters in UNI, because transcoding
may be one-to-many or many-to-one. The details of transcoding may
be implementation-defined. There may be (very rarely) cases where
there is no equivalent Unicode representation for TAG; such cases
are not covered here.
- remove 4.
- Change all T<sub>foo</sub> to M<sub>foo</sub> in the rest.
- Remove 5.1, moving up 5.2,...
- Say explicitly that hex digits always use upper case letters.
- Add examples with non-ASCII characters, both in the BMP (not
only Latin-1) and outside the BMP.
Are we supposed to review the test cases, too?
Regards, Martin.
Received on Saturday, 13 July 2002 08:11:12 UTC