normalizedString and its subtypes

I am confused by the definitions of the built-in types normalizedString and
its subtypes, in Schema Part 2.

(1). The value space of normalizedString allows all characters except xD,
xA, and x9. The lexical space allows all characters except xD and x9. What
is the mapping from the lexical space to the value space: what happens to an
xA character in the lexical space (is it removed? replaced by an x20?). The
canonical lexical representation, presumably, is the same as the string in
the value space: I think we should be told.

Presumably the lexical space represents the value after the XML parser has
done its normalization. So in practice, a tab character is allowed in an
attribute of type normalizedString (because the XML parser will turn it to a
space), but a tab character is not allowed in an element of type
normalizedString (because the XML parser will leave it unchanged). Is this
interpretation correct?

I find it hard to understand why the lexical space doesn't allow any string,
with a mapping to the value space achieved by normalizing whitespace
characters. Alternatively, the lexical space should be identical to the
value space. The current definition seems nonsensical.

(2). The type "token" ("tokens" would have been a better name) says that the
value space allows all characters except xA or x9. But since it is a
restriction of normalizedString, it actually appears to allow all characters
except xA, xD, or x9. If the restriction is going to be restated here, it
should be restated in full.

(3). The three subtypes of "token" do not allow any whitespace characters in
the value.  Why is there no supertype for these ("token" would have been a
good name) that allows any string containing no whitespace characters? I
would have thought this type would be vastly more useful than most of the
other built-in subtypes of string.

Michael Kay
Software AG

Received on Tuesday, 16 July 2002 06:23:47 UTC