W3C home > Mailing lists > Public > xmlschema-dev@w3.org > February 2002

Re: Canonical lexical representation

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 12 Feb 2002 10:29:09 -0500
To: kellyly@hotmail.com
Cc: xmlschema-dev@w3.org
Message-ID: <OF922EA01A.D7E12CC3-ON85256B5E.0053E925@lotus.com>
Kelly Lynch writes:

>> From my reading of the XML schema datatypes, I would expect that
>> the datatype value is supposed to be formatted in the "canonical
>> lexical representation". However, I've found XML validation tools
>> (namely MSXMLDOM 4.0) that enforce the less restrictive "lexical
>> representation".

>> What is supposed to be used? I really can't imagine that the W3
>> committee responsible for the datatype definitions would really
>> waste the time to define the canonical representation if they
>> were not intended to be used.    

The schema WG carefully defined a set of lexical representations
for each type, and in situations where more than one
representaiton is legal for a given type, a canonical
representation.  So "3" and "003" are equally acceptable
representations of the integer 3. 

>> Does anyone know to what level of strictness the various XML
>> schema validation tools are enforcing the datatype value
>> representations?

To be compliant with the recommendation, all schema validation
tools MUST accept all lexical representations of any given
value.  The schema recommendation mandates no situations in which
the canonical form must be used.  In practice, you will see the
canonical form used quite a bit in situations where the value does
not start out in XML.  If a program in Java does:

        int i = 3;

and you want to serialize that in XML, you'll find a lot of
middleware that outputs "3" instead of "003".  Validators MUST
accept both forms.  Particular applications of schema might
mandate canonical form only, but schemas itself does not, and 
there is no single facet in the datatypes spec that can be used 
to enforce use of canonical values.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Tuesday, 12 February 2002 12:40:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:26 GMT