- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Thu, 18 Jul 2002 18:15:16 +0200
- To: ht@cogsci.ed.ac.uk, "Kay, Michael" <Michael.Kay@softwareag.com>
- Cc: www-xml-schema-comments@w3.org
Thanks for the response. I had missed the fact that the lexical space represents the value after applying the whiteSpace normalization. This is a useful insight that I think we need to take account of in defining the casts and constructors for XQuery and XPath: these are currently defined to require a value from the lexical space as input. Michael Kay > -----Original Message----- > From: ht@cogsci.ed.ac.uk [mailto:ht@cogsci.ed.ac.uk] > Sent: 18 July 2002 15:41 > To: Kay, Michael > Cc: www-xml-schema-comments@w3.org > Subject: Re: normalizedString and its subtypes > > > "Kay, Michael" <Michael.Kay@softwareag.com> writes: > > > I am confused by the definitions of the built-in types > > normalizedString and its subtypes, in Schema Part 2. > > > > (1). The value space of normalizedString allows all > characters except > > xD, xA, and x9. The lexical space allows all characters > except xD and > > x9. What is the mapping from the lexical space to the value space: > > what happens to an xA character in the lexical space (is it > removed? > > replaced by an x20?). The canonical lexical representation, > > presumably, is the same as the string in the value space: I > think we > > should be told. > > The mapping from the lexical to the value space is 1-to-1 (I > think), so I think this is in fact a bug. The builtin > derived type normalizedString is defined as having the value > 'replace' for its whiteSpace facet, which in turn means that > all strings offered for validation as normalizedStrings will > have had "[a]ll occurrences of #x9 (tab). #xA (line feed) and > #xD (carriage return) . . . replaced with #x20 (space)" [1]. > > > Presumably the lexical space represents the value after the > XML parser > > has done its normalization. > > No, after that _and_ the _further_ normalization specified by > its whiteSpace facet. > > > So in practice, a tab character is allowed in an > > attribute of type normalizedString (because the XML parser > will turn > > it to a space), but a tab character is not allowed in an element of > > type normalizedString (because the XML parser will leave it > > unchanged). Is this interpretation correct? > > No, because the the reference quoted above takes care of the > attribute/element difference. > > > I find it hard to understand why the lexical space doesn't > allow any > > string, with a mapping to the value space achieved by normalizing > > whitespace characters. Alternatively, the lexical space should be > > identical to the value space. The current definition seems > > nonsensical. > > I agree there's a bug. I believe the 2nd alternative is > correct. There is a residual problem here to do with the > attempt to make the Datatypes REC usable independent of the > Structures REC, and the WG probably needs to step up to some > clarification here. > > > (2). The type "token" ("tokens" would have been a better name) says > > that the value space allows all characters except xA or x9. > But since > > it is a restriction of normalizedString, it actually > appears to allow > > all characters except xA, xD, or x9. If the restriction is > going to be > > restated here, it should be restated in full. > > There's an erratum pending [2] which will say precisely this. > > > (3). The three subtypes of "token" do not allow any whitespace > > characters in the value. Why is there no supertype for > these ("token" > > would have been a good name) that allows any string containing no > > whitespace characters? I would have thought this type would > be vastly > > more useful than most of the other built-in subtypes of string. > > Good idea -- perhaps we'll add this in 1.1 > > ht > > [1] > http://www.w3.org/TR/xmlschema-1/#section-White-Space-Normaliz ation-during-Validation [2] http://www.w3.org/2001/05/xmlschema-rec-comments.html/#pfitoken -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2002, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 18 July 2002 12:15:25 UTC