W3C home > Mailing lists > Public > public-sparql-dev@w3.org > January to March 2015

Re: Wikidata, SPARQL Y0K Problem

From: Markus Kroetzsch <markus.kroetzsch@tu-dresden.de>
Date: Fri, 27 Mar 2015 14:53:48 +0100
Message-ID: <551560EC.1050608@tu-dresden.de>
To: Andy Seaborne <andy@apache.org>, public-sparql-dev@w3.org
Hi Andy, hi Birte,

Thanks for the swift replies. I will carefully try to consolidate your 
answers to a consistent view (which I hope you will agree with). You 
said two important things:

 > The D-Entailment Regime explicitly refers to XSD 1.1.

That's true, and this is a normative reference in the normative part of 
the specification [1]. Therefore, it seems clear that SPARQL 1.1 
implementations that support datatype semantics should accept year 
"0000" and understand it as 1 BC.

 > SPARQL refers to XSD Schema 1.0

This is also true, and again the reference is normative [2]. It seems 
from the sentence where this reference is used that the IRI xsd:dateTime 
refers to the datatype of XSD 1.0, but Andy offered an alternative 
reading: "because XSD 1.1 does not change the URI for datatypes or 
functions, it's sort of an "upgrade in place"." In any case, this 
section only refers to the meaning of literals when used as operands in 
FILTER functions/operators.


Some things are immediately clear from these observations. In 
particular, no SPARQL 1.1 processor should ever reject the year "0000" 
in the input data or BGP. Either the processor uses simple entailment 
(then "0000" is just a string) or the processor supports D-entailment 
(then "0000" must be interpreted as per XSD 1.1). This is reassuring.

In the case of FILTERs, there seems to be some leeway for 
interpretations. In either case, there is no contradiction with 
D-entailment, since D-entailment is only about BGPs while the XSD 1.0 
reference is only used in a section about FILTERs.

I would be in favour of adopting the view that Andy has proposed, namely 
that the meaning of IRIs has been "upgraded in place" when XSD 1.1 
became a standard. If this interpretation is not used, one would get a 
very weird conforming behaviour, where the following two queries would 
have the same answers:

SELECT * WHERE {
  ?S ?P "0000-01-01T00:00:00Z"^^xsd:dateTime .
}

SELECT * WHERE {
  ?S ?P ?X  FILTER( ?X = xsd:dateTime("-0001-01-01T00:00:00Z") )
}

Note that this would be true even for processors that do not support 
D-entailment, as long as they support the xsd:dateTime FILTER at all. 
Clearly, this is not something we would want, so that the only sane 
interpretation of xsd:dateTime in SPARQL 1.1 would be to use XSD 1.1. 
Fortunately, this is also the interpretation that current RDF and OWL 
standards require, so that the meaning of year "0000" in the input file 
is the same as in the query. I hope others agree.

Note there is a non-technical dimension to all of this: if a SPARQL 
endpoint or LOD service returns data on the web, consuming applications 
must know what it means (not to calculate durations or to check validity 
-- but already to correctly display the data to their users in a 
non-technical syntax). The point of view that XSD literals are "just 
strings" may work for a DBMS implementer, but as a user of the 
technology you have to decide how to encode and query your content, 
i.e., you must know how "1 BCE" is represented. People who are building 
applications based on RDF and SPARQL therefore must make this decision, 
and I can only see them going with the most recent XSD, RDF, and OWL 
standards -- it's great to know that SPARQL 1.1 agrees with those, even 
if one has to do some interpretation to recognize this ;-)

Cheers,

Markus


[1] http://www.w3.org/TR/sparql11-entailment/#DEntRegime
[2] http://www.w3.org/TR/sparql11-query/#operandDataTypes


On 27.03.2015 12:54, Andy Seaborne wrote:
> The root change is in: ISO 8601:2000 Second Edition
> where year "0000" went from illegal to 1 BCE.
>
> Yes - I can see that's a genuine problem for wikidata.
>
> Two answers: spec effect and implementation reality.
>
> 1/ Spec answer.
>
> For just plain retrieval of data, SPARQL returns RDFterms, not related
> to their legality or value so it's the form that is returned,
> "-0001-02-03T12:11:10+00:00"^^xsd:dateTime.
>
> Or even
> "0000-02-03T12:11:10+00:00"^^xsd:dateTime
>
> If used in a FILTER, the value then does matter.
>
> SPARQL only formally requires xsd:dateTime, not xsd:date, and even then
> a limited subset of oeprations; comparison but not subtraction.  Many
> implementations include xsd:date as well.
>
> I can see two ways of observing the change:
>
> A/ If there illegal lexical forms, the year "0000" was illegal and
> became legal, and a FILTER may go from being an error to returning true
> or false.  This happens if the data has year "0000" or the FILTER
> mentions it explicitly.
>
> A FILTER expression evaluates to an error is effectively false overall
> anyway.
>
> # Different days, year 0000
> FILTER (
>     "0000-02-03T12:11:10+00:00"^^xsd:dateTime >=
>     "0000-02-02T12:11:10+00:00"^^xsd:dateTime )
>
> changes from filter error, do not return the row, to true.
>
> but comparison around the boundary is not changed.  It is the mentioning
> of 0000, explicitly or in the data, that is the problem.
>
> B/ As an extension, xsd:duration may be supported.
>
> # Across BCE/CE boundary:
> BIND("-0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d1)
> BIND("0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d2)
> BIND(?d2 - ?d1 AS ?duration)
>
> SPARQL refers to XSD Schema 1.0 but the effect of extensions is
> implementation.  Functions are named by URI and because XSD 1.1 does not
> change the URI for datatypes or functions, it's sort of an "upgrade in
> place".
>
> So specification wise, there is an impact, it's confused by the
> change-in-place of XSD URIs.
>
>  > What does "-0001-02-03"^^xsd:date mean?
>
> When that is the RDFterm returned, it's up to the application.
> When it's used in a FILTER, it's exposed to the change.
> Extensions to the core spec are impacted.
>
> 2/ Implementation answer:
>
> Implementations may rely on a 3rd party library to do the parsing and
> calculation and it will whatever that library does.
>
> For example, Jena uses Apache Xerces for parsing and the Java runtime,
> which provides XMLGregorianCalendar which is W3C XML Schema 1.0 (Java8
> and Java9), for calculation of durations.
>
>      Andy
>
> On 27/03/15 09:56, Markus Kroetzsch wrote:
>> Dear all, especially former members of the SPARQL WG,
>>
>> As you might know, the Wikimedia Foundation is currently working on
>> setting up an official public SPARQL service for Wikidata. This was done
>> not to integrate with RDF or to add to the semantic web, but simply
>> because it seems to be the best technology for the query problem at
>> hand. I think this should be considered a success :-) You are also
>> welcome to play around with the preliminary test SPARQL endpoint of
>> Wikidata, see [0], and of course to comment on the wikidata-l list
>> regarding nice SPARQL queries or other ideas.
>>
>> However, on the way to making this a reality as a fully integrated
>> feature of Wikidata/Wikipedia, there are many issues to be solved. One
>> that came up recently is about xsd:date(Time) in SPARQL 1.1. As you will
>> know, XML Schema has changed the semantics of its date types in
>> incompatible ways between XSD 1.0 and XSD 1.1:
>>
>> * XSD 1.1: "-0001-02-03"^^xsd:date means "3rd Feb 2 BCE"  [1]
>> * XSD 1.0: "-0001-02-03"^^xsd:date means "3rd Feb 1 BCE"  [2]
>>
>> Needless to say that this is a big deal in applications like Wikidata,
>> where you have a lot of historical dates. The obvious question now is:
>> What does "-0001-02-03"^^xsd:date mean when used in SPARQL? RDF? OWL?
>> Here is what I have found so far:
>>
>> * RDF 1.0: year 1 BCE
>> * OWL 1: year 1 BCE
>> * SPARQL 1.0: year 1 BCE
>> (all as expected)
>>
>> * RDF 1.1: year 2 BCE [3]
>> * OWL 2: year 2 BCE [4]
>> * SPARQL 1.1: ???
>>
>> It is interesting to note that the semantic changes in XSD, RDF and OWL
>> each are breaking changes, which change the meaning of existing
>> documents (where the document itself may not contain any hint as to
>> whether it was created before or after the change).
>>
>> I am not sure what is the case for SPARQL 1.1. It seems very much
>> preferable if SPARQL would follow the other W3C standards in this
>> matter, but I did not find out yet what was the intention of the SPARQL
>> WG. All comments are welcome, but in the end we are looking for a
>> normative answer here.
>>
>> Best regards,
>>
>> Markus
>>
>>
>> [0]
>> https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg05601.html
>> (gives
>> you the Wikidata endpoint URL, but more importantly also example queries
>> for our current RDF translation, which we are currently revising in
>> several places)
>> [1] http://www.w3.org/TR/xmlschema11-2/#dateTime
>> [2] http://www.w3.org/TR/xmlschema-2/#dateTime
>> [3] http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
>> [4] http://www.w3.org/TR/owl2-syntax/#Datatype_Maps
>>
>
>
>

-- 
Markus Kroetzsch
Faculty of Computer Science
Technische Universit├Ąt Dresden
+49 351 463 38486
http://korrekt.org/
Received on Friday, 27 March 2015 13:54:13 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 March 2015 13:54:13 UTC