- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Wed, 06 Mar 2002 17:17:38 -0700
- To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
A colleague has just made me aware of the following document;
the extracts he sent from the abstract contain some explicit and
some implicit error reports, which I am trying to get into the
record by sending this note.
-CMSMcQ
http://www.uddi.org/pubs/SchemaCentricCanonicalization-20020213.htm
Abstract
Existing XML canonicalization algorithms such as Canonical XML and
Exclusive XML Canonicalization suffer from several limitations and
design artifacts (enumerated herein) which significantly limit their
utility in many XML applications, particularly those which validate
and process XML data according to the rules of and flexibilities
afforded by XML Schema. The Schema Centric Canonicalization algorithm
addresses these concerns.
1.1 Limitations of Existing Canonicalization Algorithms
It should be noted that for these six data types, XML Schema
Datatypes does in fact normatively define a corresponding
canonical lexical representation. For example, the canonical
lexical representation of boolean permits only the use of
values in the set {true, false}. However, XML Schema makes
use of this canonicalization only in certain circumstances,
such as the interpretation of default values of attributes
and elements.
There are further data type canonicalization issues which
appear to have been overlooked by XML Schema Datatypes:
vii. (minor) It is not precisely clear from the XML Schema
Datatypes specification whether leading zeros are permitted
in instances of gYearMonth and gYear when (the absolute value
of) the year in question is outside the range of 0001 to
9999. However, in the otherwise analogous passage of the
specification of dateTime, such ambiguity is not present
(such leading zeros are prohibited), and a reasonable
interpretation in these other two cases is to
straightforwardly follow that precedent.
viii. the use of mixed case language-tags in data of type
language; this is permitted per section "2. The language tag"
of RFC 1766, which is (ultimately) the referenced normative
specification for the value space of language. (Note: this
same value space is used by the xml:lang attribute as defined
by the XML 1.0 Recommendation; thus, the omission of the
canonicalization of the case of xml:lang attributes should
reasonably be considered a flaw in even the existing
canonicalization algorithms.)
ix. More generally, it is often the case in real-world schemas
that various string-valued attributes and elements defined
therein are interpreted at the application level as being
case-insensitive. This should be capable of being captured by
the canonicalization algorithm; were it not, then
applications may be forced to remember the exact case used
for certain data, a requirement in tension with the
application semantic, and quite possibly thus a significant
implementation burden.
Received on Wednesday, 6 March 2002 19:30:10 UTC