W3C home > Mailing lists > Public > public-sparql-dev@w3.org > January to March 2015

Re: Wikidata, SPARQL Y0K Problem

From: Andy Seaborne <andy@apache.org>
Date: Fri, 27 Mar 2015 11:54:59 +0000
Message-ID: <55154513.8090204@apache.org>
To: public-sparql-dev@w3.org
The root change is in: ISO 8601:2000 Second Edition
where year "0000" went from illegal to 1 BCE.

Yes - I can see that's a genuine problem for wikidata.

Two answers: spec effect and implementation reality.

1/ Spec answer.

For just plain retrieval of data, SPARQL returns RDFterms, not related 
to their legality or value so it's the form that is returned, 
"-0001-02-03T12:11:10+00:00"^^xsd:dateTime.

Or even
"0000-02-03T12:11:10+00:00"^^xsd:dateTime

If used in a FILTER, the value then does matter.

SPARQL only formally requires xsd:dateTime, not xsd:date, and even then 
a limited subset of oeprations; comparison but not subtraction.  Many 
implementations include xsd:date as well.

I can see two ways of observing the change:

A/ If there illegal lexical forms, the year "0000" was illegal and 
became legal, and a FILTER may go from being an error to returning true 
or false.  This happens if the data has year "0000" or the FILTER 
mentions it explicitly.

A FILTER expression evaluates to an error is effectively false overall 
anyway.

# Different days, year 0000
FILTER (
    "0000-02-03T12:11:10+00:00"^^xsd:dateTime >=
    "0000-02-02T12:11:10+00:00"^^xsd:dateTime )

changes from filter error, do not return the row, to true.

but comparison around the boundary is not changed.  It is the mentioning 
of 0000, explicitly or in the data, that is the problem.

B/ As an extension, xsd:duration may be supported.

# Across BCE/CE boundary:
BIND("-0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d1)
BIND("0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d2)
BIND(?d2 - ?d1 AS ?duration)

SPARQL refers to XSD Schema 1.0 but the effect of extensions is 
implementation.  Functions are named by URI and because XSD 1.1 does not 
change the URI for datatypes or functions, it's sort of an "upgrade in 
place".

So specification wise, there is an impact, it's confused by the 
change-in-place of XSD URIs.

 > What does "-0001-02-03"^^xsd:date mean?

When that is the RDFterm returned, it's up to the application.
When it's used in a FILTER, it's exposed to the change.
Extensions to the core spec are impacted.

2/ Implementation answer:

Implementations may rely on a 3rd party library to do the parsing and 
calculation and it will whatever that library does.

For example, Jena uses Apache Xerces for parsing and the Java runtime, 
which provides XMLGregorianCalendar which is W3C XML Schema 1.0 (Java8 
and Java9), for calculation of durations.

	Andy

On 27/03/15 09:56, Markus Kroetzsch wrote:
> Dear all, especially former members of the SPARQL WG,
>
> As you might know, the Wikimedia Foundation is currently working on
> setting up an official public SPARQL service for Wikidata. This was done
> not to integrate with RDF or to add to the semantic web, but simply
> because it seems to be the best technology for the query problem at
> hand. I think this should be considered a success :-) You are also
> welcome to play around with the preliminary test SPARQL endpoint of
> Wikidata, see [0], and of course to comment on the wikidata-l list
> regarding nice SPARQL queries or other ideas.
>
> However, on the way to making this a reality as a fully integrated
> feature of Wikidata/Wikipedia, there are many issues to be solved. One
> that came up recently is about xsd:date(Time) in SPARQL 1.1. As you will
> know, XML Schema has changed the semantics of its date types in
> incompatible ways between XSD 1.0 and XSD 1.1:
>
> * XSD 1.1: "-0001-02-03"^^xsd:date means "3rd Feb 2 BCE"  [1]
> * XSD 1.0: "-0001-02-03"^^xsd:date means "3rd Feb 1 BCE"  [2]
>
> Needless to say that this is a big deal in applications like Wikidata,
> where you have a lot of historical dates. The obvious question now is:
> What does "-0001-02-03"^^xsd:date mean when used in SPARQL? RDF? OWL?
> Here is what I have found so far:
>
> * RDF 1.0: year 1 BCE
> * OWL 1: year 1 BCE
> * SPARQL 1.0: year 1 BCE
> (all as expected)
>
> * RDF 1.1: year 2 BCE [3]
> * OWL 2: year 2 BCE [4]
> * SPARQL 1.1: ???
>
> It is interesting to note that the semantic changes in XSD, RDF and OWL
> each are breaking changes, which change the meaning of existing
> documents (where the document itself may not contain any hint as to
> whether it was created before or after the change).
>
> I am not sure what is the case for SPARQL 1.1. It seems very much
> preferable if SPARQL would follow the other W3C standards in this
> matter, but I did not find out yet what was the intention of the SPARQL
> WG. All comments are welcome, but in the end we are looking for a
> normative answer here.
>
> Best regards,
>
> Markus
>
>
> [0]
> https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg05601.html (gives
> you the Wikidata endpoint URL, but more importantly also example queries
> for our current RDF translation, which we are currently revising in
> several places)
> [1] http://www.w3.org/TR/xmlschema11-2/#dateTime
> [2] http://www.w3.org/TR/xmlschema-2/#dateTime
> [3] http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
> [4] http://www.w3.org/TR/owl2-syntax/#Datatype_Maps
>
Received on Friday, 27 March 2015 11:55:29 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 March 2015 11:55:30 UTC