W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > January to March 2002

[www-xml-schema-comments] <none>

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Wed, 06 Mar 2002 17:17:38 -0700
Message-Id: <>
To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
A colleague has just made me aware of the following document;
the extracts he sent from the abstract contain some explicit and
some implicit error reports, which I am trying to get into the
record by sending this note.



    Existing XML canonicalization algorithms such as Canonical XML and
    Exclusive XML Canonicalization suffer from several limitations and
    design artifacts (enumerated herein) which significantly limit their
    utility in many XML applications, particularly those which validate
    and process XML data according to the rules of and flexibilities
    afforded by XML Schema. The Schema Centric Canonicalization algorithm
    addresses these concerns.

   1.1 Limitations of Existing Canonicalization Algorithms

             It should be noted that for these six data types, XML Schema
             Datatypes does in fact normatively define a corresponding
             canonical lexical representation. For example, the canonical
             lexical representation of boolean permits only the use of
             values in the set {true, false}. However, XML Schema makes
             use of this canonicalization only in certain circumstances,
             such as the interpretation of default values of attributes
             and elements.
             There are further data type canonicalization issues which
             appear to have been overlooked by XML Schema Datatypes:
         vii. (minor) It is not precisely clear from the XML Schema
             Datatypes specification whether leading zeros are permitted
             in instances of gYearMonth and gYear when (the absolute value
             of) the year in question is outside the range of 0001 to
             9999. However, in the otherwise analogous passage of the
             specification of dateTime, such ambiguity is not present
             (such leading zeros are prohibited), and a reasonable
             interpretation in these other two cases is to
             straightforwardly follow that precedent.
         viii. the use of mixed case language-tags in data of type
             language; this is permitted per section "2. The language tag"
             of RFC 1766, which is (ultimately) the referenced normative
             specification for the value space of language. (Note: this
             same value space is used by the xml:lang attribute as defined
             by the XML 1.0 Recommendation; thus, the omission of the
             canonicalization of the case of xml:lang attributes should
             reasonably be considered a flaw in even the existing

             canonicalization algorithms.)
         ix. More generally, it is often the case in real-world schemas
             that various string-valued attributes and elements defined
             therein are interpreted at the application level as being
             case-insensitive. This should be capable of being captured by
             the canonicalization algorithm; were it not, then
             applications may be forced to remember the exact case used
             for certain data, a requirement in tension with the
             application semantic, and quite possibly thus a significant
             implementation burden.
Received on Wednesday, 6 March 2002 19:30:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:08:57 UTC