Re: Review of XSD Datatypes 1.1 Changes

On Feb 2, 2012, at 2:08 PM, Alex Hall wrote:

> Per ACTION-136 - Review changes in W3C XML Schema Definition Language (XSD) -- http://www.w3.org/TR/2012/PR-xmlschema11-2-20120119/#changes
> 
> I've completed my review of the changes in XSD Datatypes 1.1. Rather than go through the exhaustive list of changes, I'll summarize the areas that I think are relevant to RDF:
> 
> 1. Datatype definitions, including definitions of lexical spaces, value spaces, L2V mappings, and canonical mappings, underwent a thorough revision. This is a good thing, because the new definitions are much more precisely stated and leave less room for confusion. In general, RDF defers to XSD for datatype definitions so I don't think any action on our part is required here in terms of the RDF specs. However, implementors of XSD datatype processing in RDF will want to review these changes so we might want to call their attention to them. I did verify that the short-form literal definitions in Turtle for boolean, double, decimal, and integer are still valid subsets of the respective lexical spaces in XSD 1.1.
> 
> 2. XSD 1.1 distinguishes between the identity of values and the (numeric) equality of values.

In order for this to make sense, the value spaces have to be defined so that there are distinct but numerically identical values. As far as I can understand it, this means that for example 3 and 3.0 are different values in XSD, and the value spaces of xsd:number and xsd:real (for example) are not what a mathematician would mean by 'natural number' and 'real number' respectively. Given this, then...

> As far as I can tell, RDF Semantics is defined strictly in terms of identities (I would appreciate confirmation of this from one of the editors). 

...yes. But identity in this sense is conventionally indicated by the equality sign '=', which might get confusing. 

FWIW, I do not myself find this distinction to be meaningful (two *numbers* can be distinct but have the same *numerical* value? Does that make sense to you?) but no doubt that is due to my early brainwashing as a mathematician. 

> To avoid confusion, it might be worth noting this distinction in the section on datatype entailment and explicitly stating that datatype entailment deals with identity and not equality, if that is indeed our position. [For SPARQL, pattern matching deals with identity and the '=' operator deals with equality.

Is that a previous decision, or do you just presume that it must work this way? I would greatly prefer the case where the equality sign means identity, consistently (as it does everywhere else in the known universe). If we need to distinguish identity of value-space-elements from identity of numerical values of value-spece elements, then the tidy and clear way to do this is to have a function from the former to the latter, and write things like 

numerical-value(xsd:number("+0")) = numerical-value(xsd-value("-0"))

even though xsd:number("+0") =/= xsd:number("-0")

ie in a nutshell, equality is identity of numerical-values. 

> ]
> 
> 3. The float and double datatypes introduce positive and negative zero to the value space; these values are distinct but equal. Conversely, NaN is identical to but not equal to itself.

So numerical equality is not reflexive? This is a very strange world that XSD has invented.

> This does have implications for RDF (and SPARQL). For instance, take the statements:
> 
> <s> <p> "+0"^^xsd:double .
> <s> <p> "-0"^^xsd:double .
> 
> These two statements are equivalent under XSD entailment using the definition of double from XSD 1.0 (because "+0" and "-0" both mapped to the value zero), but are distinct under XSD entailment using the definition from XSD 1.1.
> 
> But, given a graph with these statements, the SPARQL query: SELECT * { <s> <p> ?o FILTER ( ?o = "0"^^xsd:double ) } should return two rows.
> 
> Meanwhile, given the graph:
> 
> <s> <p> "NaN"^^xsd:double .
> 
> SELECT * { ?s <p> "NaN"^^xsd:double } should return one row.
> SELECT * { <s> <p> ?o FILTER ( ?o = "NaN"^^xsd:double ) } should return zero rows.
> 
> 4. The value spaces of the primitive datatypes are disjoint. This is not actually a change in XSD 1.1, but is given more prominence (moved from Section 4, buried in the definition of the equality facet to Section 2 in the definition of the datatype system). So, strictly speaking, the graph { <s> <p> "1.0"^^xsd:decimal } does not XSD-entail the graph { <s> <p> "1.0"^^xsd:double } because decimal and double are different primitive types. This came as a surprise to me, even though I've spent some time poking around in the XSD specs, so I thought I'd call attention to it here. I had just presumed that the value denoted by both literals was simply the number 1.

Whatever these values are, they are not numbers, for sure. 

> 
> 5. The definition of the xsd:duration datatype has been significantly revised. We should revisit the statement that "xsd:duration does not have a well-defined value space" and therefore should not be used in RDF.

Indeed. 

> To begin with, I don't know what "well-defined" means in the context of this sentence. I do know that the confusion surrounding xsd:duration has to do with the fact that different months have different numbers of days, and the difficulty that arises when trying to compare a duration with a month component to one with (day/hour/minutes/seconds) components that total 28 days or more.

Not to mention leap years, leap seconds, etc..

> 
> The duration definition in XSD 1.1 does have a clearly defined:
>    - lexical space, which is the same as that in 1.0
>    - value space, which is modeled as a [ months as xsd:integer, seconds as xsd:decimal ] tuple.
>    - identity condition: two durations are identical if and only if their months and seconds components are both identical.
>    - equality relation, which is the same as its identity relation.
>    - partial ordering.
> 
> Given these revisions, we should consider including xsd:duration in the list of RDF-compatible XSD types.

Absolutely. I dont think this value space makes calendric sense as a specification of an actual duration, but that isn't our business.

> 
> 6. We should include the following types, new in XSD 1.1, to the list of RDF-compatible XSD types:
>    - xsd:dateTimeStamp, derived from xsd:dateTime by requiring a timezone offset.
>    - xsd:dayTimeDuration, derived from xsd:duration by restricting the months component in the value space to be zero.
>    - xsd:yearMonth, derived from xsd:duration by restricting the seconds component in the value space to be zero.
> 
> Regardless of what is decided for xsd:duration, we should include dayTimeDuration and yearMonthDuration since both of these types are totally ordered.

Agreed.

Pat

> 
> Regards,
> Alex
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Friday, 3 February 2012 20:50:15 UTC