- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Wed, 06 Mar 2002 17:17:38 -0700
- To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
A colleague has just made me aware of the following document; the extracts he sent from the abstract contain some explicit and some implicit error reports, which I am trying to get into the record by sending this note. -CMSMcQ http://www.uddi.org/pubs/SchemaCentricCanonicalization-20020213.htm Abstract Existing XML canonicalization algorithms such as Canonical XML and Exclusive XML Canonicalization suffer from several limitations and design artifacts (enumerated herein) which significantly limit their utility in many XML applications, particularly those which validate and process XML data according to the rules of and flexibilities afforded by XML Schema. The Schema Centric Canonicalization algorithm addresses these concerns. 1.1 Limitations of Existing Canonicalization Algorithms It should be noted that for these six data types, XML Schema Datatypes does in fact normatively define a corresponding canonical lexical representation. For example, the canonical lexical representation of boolean permits only the use of values in the set {true, false}. However, XML Schema makes use of this canonicalization only in certain circumstances, such as the interpretation of default values of attributes and elements. There are further data type canonicalization issues which appear to have been overlooked by XML Schema Datatypes: vii. (minor) It is not precisely clear from the XML Schema Datatypes specification whether leading zeros are permitted in instances of gYearMonth and gYear when (the absolute value of) the year in question is outside the range of 0001 to 9999. However, in the otherwise analogous passage of the specification of dateTime, such ambiguity is not present (such leading zeros are prohibited), and a reasonable interpretation in these other two cases is to straightforwardly follow that precedent. viii. the use of mixed case language-tags in data of type language; this is permitted per section "2. The language tag" of RFC 1766, which is (ultimately) the referenced normative specification for the value space of language. (Note: this same value space is used by the xml:lang attribute as defined by the XML 1.0 Recommendation; thus, the omission of the canonicalization of the case of xml:lang attributes should reasonably be considered a flaw in even the existing canonicalization algorithms.) ix. More generally, it is often the case in real-world schemas that various string-valued attributes and elements defined therein are interpreted at the application level as being case-insensitive. This should be capable of being captured by the canonicalization algorithm; were it not, then applications may be forced to remember the exact case used for certain data, a requirement in tension with the application semantic, and quite possibly thus a significant implementation burden.
Received on Wednesday, 6 March 2002 19:30:10 UTC