Review of XSD Datatypes 1.1 Changes

Per ACTION-136 - Review changes in W3C XML Schema Definition Language (XSD)
-- http://www.w3.org/TR/2012/PR-xmlschema11-2-20120119/#changes

I've completed my review of the changes in XSD Datatypes 1.1. Rather than
go through the exhaustive list of changes, I'll summarize the areas that I
think are relevant to RDF:

1. Datatype definitions, including definitions of lexical spaces, value
spaces, L2V mappings, and canonical mappings, underwent a thorough
revision. This is a good thing, because the new definitions are much more
precisely stated and leave less room for confusion. In general, RDF defers
to XSD for datatype definitions so I don't think any action on our part is
required here in terms of the RDF specs. However, implementors of XSD
datatype processing in RDF will want to review these changes so we might
want to call their attention to them. I did verify that the short-form
literal definitions in Turtle for boolean, double, decimal, and integer are
still valid subsets of the respective lexical spaces in XSD 1.1.

2. XSD 1.1 distinguishes between the identity of values and the (numeric)
equality of values. As far as I can tell, RDF Semantics is defined strictly
in terms of identities (I would appreciate confirmation of this from one of
the editors). To avoid confusion, it might be worth noting this distinction
in the section on datatype entailment and explicitly stating that datatype
entailment deals with identity and not equality, if that is indeed our
position. [For SPARQL, pattern matching deals with identity and the '='
operator deals with equality.]

3. The float and double datatypes introduce positive and negative zero to
the value space; these values are distinct but equal. Conversely, NaN is
identical to but not equal to itself. This does have implications for RDF
(and SPARQL). For instance, take the statements:

<s> <p> "+0"^^xsd:double .
<s> <p> "-0"^^xsd:double .

These two statements are equivalent under XSD entailment using the
definition of double from XSD 1.0 (because "+0" and "-0" both mapped to the
value zero), but are distinct under XSD entailment using the definition
from XSD 1.1.

But, given a graph with these statements, the SPARQL query: SELECT * { <s>
<p> ?o FILTER ( ?o = "0"^^xsd:double ) } should return two rows.

Meanwhile, given the graph:

<s> <p> "NaN"^^xsd:double .

SELECT * { ?s <p> "NaN"^^xsd:double } should return one row.
SELECT * { <s> <p> ?o FILTER ( ?o = "NaN"^^xsd:double ) } should return
zero rows.

4. The value spaces of the primitive datatypes are disjoint. This is not
actually a change in XSD 1.1, but is given more prominence (moved from
Section 4, buried in the definition of the equality facet to Section 2 in
the definition of the datatype system). So, strictly speaking, the graph {
<s> <p> "1.0"^^xsd:decimal } does not XSD-entail the graph { <s> <p>
"1.0"^^xsd:double } because decimal and double are different primitive
types. This came as a surprise to me, even though I've spent some time
poking around in the XSD specs, so I thought I'd call attention to it here.
I had just presumed that the value denoted by both literals was simply the
number 1.

5. The definition of the xsd:duration datatype has been significantly
revised. We should revisit the statement that "xsd:duration does not have a
well-defined value space" and therefore should not be used in RDF. To begin
with, I don't know what "well-defined" means in the context of this
sentence. I do know that the confusion surrounding xsd:duration has to do
with the fact that different months have different numbers of days, and the
difficulty that arises when trying to compare a duration with a month
component to one with (day/hour/minutes/seconds) components that total 28
days or more.

The duration definition in XSD 1.1 does have a clearly defined:
   - lexical space, which is the same as that in 1.0
   - value space, which is modeled as a [ months as xsd:integer, seconds as
xsd:decimal ] tuple.
   - identity condition: two durations are identical if and only if their
months and seconds components are both identical.
   - equality relation, which is the same as its identity relation.
   - partial ordering.

Given these revisions, we should consider including xsd:duration in the
list of RDF-compatible XSD types.

6. We should include the following types, new in XSD 1.1, to the list of
RDF-compatible XSD types:
   - xsd:dateTimeStamp, derived from xsd:dateTime by requiring a timezone
offset.
   - xsd:dayTimeDuration, derived from xsd:duration by restricting the
months component in the value space to be zero.
   - xsd:yearMonth, derived from xsd:duration by restricting the seconds
component in the value space to be zero.

Regardless of what is decided for xsd:duration, we should include
dayTimeDuration and yearMonthDuration since both of these types are totally
ordered.

Regards,
Alex

Received on Thursday, 2 February 2012 22:09:41 UTC