Re: comment on current deliberations in RDF Core concerning the lexical space of XML Schema datatypes

From: pat hayes <phayes@ihmc.us>
Subject: Re: comment on current deliberations in RDF Core concerning the lexical space of XML Schema datatypes
Date: Fri, 12 Sep 2003 23:45:43 -0700

> >I see several claims in the RDF Core WG mailing list archives to the effect
> >that the lexical space of XML Schema datatypes is not well-defined.
> >However, when I look at http://www.w3.org/TR/xmlschema-2/, I see very
> >specific definitions of the value space of XML Schema datatypes.  For
> >example http://www.w3.org/TR/xmlschema-2/#decimal states
> >
> >	3.2.3.1 Lexical representation
> >
> >	decimal has a lexical representation consisting of a finite-length
> >	sequence of decimal digits (#x30-#x39) separated by a period as a
> >	decimal indicator. If totalDigits is specified, the number of
> >	digits must be less than or equal to totalDigits. If
> >	fractionDigits is specified, the number of digits following
> >	the decimal point must be less than or equal to the
> >	fractionDigits. An optional leading sign is allowed. If
> >	the sign is omitted, "+" is assumed. Leading and trailing zeroes
> >	are optional. If the fractional part is zero, the period and
> >	following zero(es) can be omitted. For example: -1.23,
> >	12678967.543233, +100000.00, 210.
> >
> >This appears to me to be completely well-defined.  In particular, leading
> >and trailing spaces are *NOT* allowed in the lexical representation of the
> >XML schema decimal datatype.
> 
> I agree that this reads unambiguously. Unfortunately there is also 
> other wording in the same document which suggests an alternative 
> interpretation: to wit, the inclusion of the whiteSpace facet as a 
> constraining facet on xsd:integer, which is meaningless if one 
> understands the above text strictly in the way you suggest.  And even 
> more regrettably, there is ample evidence in the text of the document 
> that it is not intended to be read quite so strictly, viz. it is 
> internally incoherent if so read.  Overall, therefore, I think that 
> the situation is ambiguous, or at least underdetermined. In any case, 
> we have decided to ask the XSD WG to rule on the matter and will 
> include a test case illustrating their ruling.

I am unaware of any ambiguities in the XML Schema documents in this area.
It is certainly the case that different values for the whiteSpace facet may
change the value space for datatypes derived from string, but this does not
raise any ambiguity.

The following text in XML Schema Part 1 makes the entire processing
painfully clear in my view

	Part 1 - 2.2.1.2 Simple Type Definition

	A simple type definition is a set of constraints on strings and
	information about the values they encode, applicable to the
	normalized value of an attribute information item or of an
	element information item with no element children. Informally, it
	applies to the values of attributes and the text-only content of
	elements.

	[...]

	Part 1 - 3.1.4 White Space Normalization during Validation

	Throughout this specification, [Definition:] the initial value of
	some attribute information item is the value of the [normalized
	value] property of that item. Similarly, the initial value of an
	element information item is the string composed of, in order, the
	[character code] of each character information item in the
	[children] of that element information item.

	[...]

	[Definition:] The normalized value of an element or attribute
	information item is an initial value whose white space, if any, has
	been normalized according to the value of the whiteSpace facet of
	the simple type definition used in its validation:

	preserve
	    No normalization is done, the value is the normalized value        
	replace
	    All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage
	    return) are replaced with #x20 (space). 
	collapse
	    Subsequent to the replacements specified above under replace,
	    contiguous sequences of #x20s are collapsed to a single #x20,
	    and initial and/or final #x20s are deleted. 

There you have it.  Simple type definitions work on normalized values,
which are the result of white space processing.

[...]

Peter F. Patel-Schneider

Received on Saturday, 13 September 2003 17:06:46 UTC