I18n comments on XML Schema: Structures

Summary of disposition

13 October 2000
Henry S. Thompson
C. M. Sperberg-McQueen

This document reproduces the comments made by the W3C i18n WG on the 7 April 2000 last call draft of XML Schema, and provides a quick summary, for each point, of what the XML WG and editors of the spec have done (or in some cases are doing) in response.

Message-Id: <4.2.0.58.J.20000531142526.03428d70@sh.w3.mag.keio.ac.jp>
Date: Wed, 31 May 2000 14:26:01 +0900
To: www-xml-schema-comments@w3.org
From: "Martin J. Duerst" <duerst@w3.org>
Subject: Fwd: I18N Last call comments on Schema Part 1

Forwarded on request of C. M. Sperberg-McQueen3

Date: Tue, 30 May 2000 18:11:01 +0900
From: "Martin J. Duerst" <duerst@w3.org>
Subject: I18N Last call comments on Schema Part 1

Dear Schema WG,

[This mail is crossposted to the I18N IG to allow for further discussion. Please feel free to forward these comments to another list, including a public list, but please make sure that you don't reveal the mail addresses of the various groups.]

This are the last call comments on XML Schema Part 1: Structures from the I18N WG/IG.

The comments are numbered by [n], but their order does not reflect their importance.

[1] The spec repeatedly contains language such as "the string composed of the [character code] of each of the element information item's character information item [children] in order"

This is overly complex and confusing. First, a string is composed of characters, not of character codes (which are numbers). This has to be corrected. Second, the phrase is used so often and the concept behind it so obvious that it would help a lot to define a term for it once.

[Similar phrases are also found in Part 2, this comment should also be refelected there; it is made here only once for both parts.]

LC-206

All occurrences of this phrasing have been eliminated in connection with the introduction of the schema-normalized value property, the value of which actually is a string.

[2] Section 3.12 says: 'In the case of {user information}, indication may be given as to the identity of the (human) language used in the contents, using the xml:lang attribute.' Please change 'may' to 'should'. Also see points [3], [4], [5].

LC-206

Done, thanks.

[3] Please indicate how annotations in multiple languages are done. Being able to make annotations in multiple languages in a clearly defined and interoperable way is important.

LC-206

This appears unnecessary (see your point 4a).

Or do you mean simply mention that if both French and Chinese documentation is provided, they can be distinguiished by using the xml:lang attribute?

[4] Section 5.9, in point 2, says that the value of xml:lang must conform to the req's set out in XML 1.0. There are two problems here:

Part of LC-206

Done, thanks.

[5] It should be made clear that <documentation> can contain additional markup. As neither <annotation> nor <documentation> is defined in App. A, this isn't clear.

Part of LC-206

Done, thanks.

[6] It should be clear that for all references of URIs/URI References, this is to be understood as including the provisions of relevant section of the W3C Character Model (http://www.w3.org/TR/charmod/#URIs). Please see point [30] of our comments to Part 2.

Part of LC-206

Datatypes issue (no action apparently needed in Part 1).

[7] In http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/1999Nov/0007.html we have made a detailed request to make sure that XML Schemas can address the problems of i18n-related markup. This detailed request was listed as issue 209 but summarily abandoned. http://www.w3.org/XML/Group/xmlschema-current/issues.html#easyAddIns

We have not received any response that would allow us to determine that these issues are addressed satisfactorily in the current spec. We herewith resubmit the abovementioned mail as part of this last call comment, and request the XML Schema WG to provide a detailled answer as part of the resolution process so that we can decide whether our requirements are met. Apart from this general answer with followup, we mention a few specific points below [8].

See LC-215.

We believe that it is easy to add attributes and elements in the manner described.

Note: objection has been raised to our use of the verb 'believe' instead of a verb like 'know'. Let us be clear: the verb reflects the inherent vagueness of the test you prescribe ("It must be very easy") rather than uncertainty on our part. We know it is possible; we find the techniques involved sufficiently simple and easy that we believe we have met the requirement. But statements of the form "X is easy" are not suitable objects of formal proof.

[8] The mail mentioned in [7] mentions addition of elements and attributes in general, but one particular and particularly frequent case is the addition of child elements to elements that do not have any child elements defined yet. In the current draft, such elements can be defined in two ways, either as 'mixed' without any elements specified or as 'string'.

[There may be a third one, 'textOnly', as guessable from 4.3.3. However, the spec seems not consistent on this. For example, there is: {base type definition} The type definition resolved to by the value of the base [attribute], if present, otherwise the simple ur-type definition if the content [attribute] is textOnly, otherwise the complex ur-type definition. but earlier, there is only one ur-type, so this is confusing.]

In order to make extensions easy, the 'mixed' type without child elements and the string type (as long as not restricted by a facet, and see point [9]) should be merged. In terms of functionality, this should not provide any problems at all, because it is just a question of deferring decisions until they really are necessary.

It may be claimed that instead of merging 'mixed' and 'string' as above, it would suffice to always use 'mixed' in cases further addition of elements is desired. However, we feel that this is not sufficient, 'string' is too easy to use and will be used in too many instances.

See LC-216.

Requested action not taken, sorry.

We believe it is impossible to eliminate the string type for elements entirely without doing violence to the type system; we believe it would be more desirable to define a library of useful types which includes a suitable definition of generic text, and we invite the i18n WG to collaborate with us in creating suitable definitions for such a library. We also believe that best-practice guidelines should encourage the use of mixed content for elements instead of string, unless very specific reasons exist for using string.

[9] As explained in item [35]/[36] of our comments to part 2, it will often be necessary to include character repertoire constraints in XML Schema. Such constraints should also be applicable to character children even if an element also has element children. This can easily be done by allowing a pattern facet even on complex types provided that this pattern facet only consists of a character class expression. This does not pose any problems with respect to the interleaving order of characters conforming to the pattern and elements conforming to the content model.

See LC-217.

Sorry, not in 1.0. See formal response for rationale.

[10] The verbal complexity of the XML Schema specs, in particular part 1, is extremely high. We have serious doubts regarding understandability by non-native speakers as well as translatability. We ask the XML Schema WG and the editors to undertake every effort to use clear and simple language.

We have tried our best.