- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Thu, 18 Jul 2002 18:15:16 +0200
- To: ht@cogsci.ed.ac.uk, "Kay, Michael" <Michael.Kay@softwareag.com>
- Cc: www-xml-schema-comments@w3.org
Thanks for the response. I had missed the fact that the lexical space
represents the value after applying the whiteSpace normalization. This is a
useful insight that I think we need to take account of in defining the casts
and constructors for XQuery and XPath: these are currently defined to
require a value from the lexical space as input.
Michael Kay
> -----Original Message-----
> From: ht@cogsci.ed.ac.uk [mailto:ht@cogsci.ed.ac.uk]
> Sent: 18 July 2002 15:41
> To: Kay, Michael
> Cc: www-xml-schema-comments@w3.org
> Subject: Re: normalizedString and its subtypes
>
>
> "Kay, Michael" <Michael.Kay@softwareag.com> writes:
>
> > I am confused by the definitions of the built-in types
> > normalizedString and its subtypes, in Schema Part 2.
> >
> > (1). The value space of normalizedString allows all
> characters except
> > xD, xA, and x9. The lexical space allows all characters
> except xD and
> > x9. What is the mapping from the lexical space to the value space:
> > what happens to an xA character in the lexical space (is it
> removed?
> > replaced by an x20?). The canonical lexical representation,
> > presumably, is the same as the string in the value space: I
> think we
> > should be told.
>
> The mapping from the lexical to the value space is 1-to-1 (I
> think), so I think this is in fact a bug. The builtin
> derived type normalizedString is defined as having the value
> 'replace' for its whiteSpace facet, which in turn means that
> all strings offered for validation as normalizedStrings will
> have had "[a]ll occurrences of #x9 (tab). #xA (line feed) and
> #xD (carriage return) . . . replaced with #x20 (space)" [1].
>
> > Presumably the lexical space represents the value after the
> XML parser
> > has done its normalization.
>
> No, after that _and_ the _further_ normalization specified by
> its whiteSpace facet.
>
> > So in practice, a tab character is allowed in an
> > attribute of type normalizedString (because the XML parser
> will turn
> > it to a space), but a tab character is not allowed in an element of
> > type normalizedString (because the XML parser will leave it
> > unchanged). Is this interpretation correct?
>
> No, because the the reference quoted above takes care of the
> attribute/element difference.
>
> > I find it hard to understand why the lexical space doesn't
> allow any
> > string, with a mapping to the value space achieved by normalizing
> > whitespace characters. Alternatively, the lexical space should be
> > identical to the value space. The current definition seems
> > nonsensical.
>
> I agree there's a bug. I believe the 2nd alternative is
> correct. There is a residual problem here to do with the
> attempt to make the Datatypes REC usable independent of the
> Structures REC, and the WG probably needs to step up to some
> clarification here.
>
> > (2). The type "token" ("tokens" would have been a better name) says
> > that the value space allows all characters except xA or x9.
> But since
> > it is a restriction of normalizedString, it actually
> appears to allow
> > all characters except xA, xD, or x9. If the restriction is
> going to be
> > restated here, it should be restated in full.
>
> There's an erratum pending [2] which will say precisely this.
>
> > (3). The three subtypes of "token" do not allow any whitespace
> > characters in the value. Why is there no supertype for
> these ("token"
> > would have been a good name) that allows any string containing no
> > whitespace characters? I would have thought this type would
> be vastly
> > more useful than most of the other built-in subtypes of string.
>
> Good idea -- perhaps we'll add this in 1.1
>
> ht
>
> [1]
> http://www.w3.org/TR/xmlschema-1/#section-White-Space-Normaliz
ation-during-Validation
[2] http://www.w3.org/2001/05/xmlschema-rec-comments.html/#pfitoken
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
W3C Fellow 1999--2002, part-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged
spam]
Received on Thursday, 18 July 2002 12:15:25 UTC