W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2001

Surprising tokens!

From: Eric van der Vlist <vdv@dyomedea.com>
Date: Thu, 18 Oct 2001 12:39:57 +0200
Message-ID: <3BCEB17D.30600@dyomedea.com>
To: xmlschema-dev@w3.org
I find the definition of the token datatype highly confusing:

http://www.w3.org/TR/xmlschema-2/#token

[Definition:]   token represents tokenized strings. The ·value space· of 
token is the set of strings that do not contain the line feed (#xA) nor 
tab (#x9) characters, that have no leading or trailing spaces (#x20) and 
that have no internal sequences of two or more spaces. The ·lexical 
space· of token is the set of strings that do not contain the line feed 
(#xA) nor tab (#x9) characters, that have no leading or trailing spaces 
(#x20) and that have no internal sequences of two or more spaces.

and

<xs:simpleType name="token" id="token">
  <xs:annotation>
   <xs:documentation
         source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
   <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

What's the point of mentioning that "the ·value space· of token is the 
set of strings that do not contain the line feed (#xA) nor tab (#x9) 
characters, that have no leading or trailing spaces (#x20) and that have 
no internal sequences of two or more spaces" since xs:token has a 
whitespace behavior set to "collapse" which means that #xA, #x9 (and 
also #xD) will have been been replaced by #x20, that leading and 
trailing spaces will have been trimed and that any occurence of more 
than a single #x20 will have been replaced by a single #x20?

Then, do we really want to give the same constraint on the lexical space?

Why do we have a special treatment for #xD? If I read all this 
correctly, "t&#x20;&#xD;&#20;oken" is a valid xs:token. Is this expected?

And, if we want to restrict the lexical value, wouldn't have been 
possible to do it through a pattern? Or is this something that cannot be 
expressed by the pattern syntax?

Finally, if the purpose of xs:token is to represent "tokenized strings", 
wouldn't have been better named "xs:tokenized" or "xs:tokenizedString" 
to avoid the confusion with the "real" tokens (xs:NMTOKEN)?

Thanks.

Eric (puzzled)
-- 
Rendez-vous à Paris pour le Forum XML.
                    http://www.technoforum.fr/Pages/forumXML01/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------
Received on Thursday, 18 October 2001 06:39:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:24 GMT