W3C home > Mailing lists > Public > public-sparql-dev@w3.org > January to March 2015

Re: Wikidata, SPARQL Y0K Problem

From: Birte Glimm <birte.glimm@uni-ulm.de>
Date: Fri, 27 Mar 2015 15:02:53 +0100
Message-ID: <CABt65OdpCqfWfFpwaJY7m6_UfivD0nwfUNZVGOoOo3h6YKU9Aw@mail.gmail.com>
To: Markus Kroetzsch <markus.kroetzsch@tu-dresden.de>
Cc: Andy Seaborne <andy@apache.org>, SPARQL <public-sparql-dev@w3.org>
Nice summary to which I can agree.

Cheers,

Birte

On 27 March 2015 at 14:53, Markus Kroetzsch
<markus.kroetzsch@tu-dresden.de> wrote:
> Hi Andy, hi Birte,
>
> Thanks for the swift replies. I will carefully try to consolidate your
> answers to a consistent view (which I hope you will agree with). You said
> two important things:
>
>> The D-Entailment Regime explicitly refers to XSD 1.1.
>
> That's true, and this is a normative reference in the normative part of the
> specification [1]. Therefore, it seems clear that SPARQL 1.1 implementations
> that support datatype semantics should accept year "0000" and understand it
> as 1 BC.
>
>> SPARQL refers to XSD Schema 1.0
>
> This is also true, and again the reference is normative [2]. It seems from
> the sentence where this reference is used that the IRI xsd:dateTime refers
> to the datatype of XSD 1.0, but Andy offered an alternative reading:
> "because XSD 1.1 does not change the URI for datatypes or functions, it's
> sort of an "upgrade in place"." In any case, this section only refers to the
> meaning of literals when used as operands in FILTER functions/operators.
>
>
> Some things are immediately clear from these observations. In particular, no
> SPARQL 1.1 processor should ever reject the year "0000" in the input data or
> BGP. Either the processor uses simple entailment (then "0000" is just a
> string) or the processor supports D-entailment (then "0000" must be
> interpreted as per XSD 1.1). This is reassuring.
>
> In the case of FILTERs, there seems to be some leeway for interpretations.
> In either case, there is no contradiction with D-entailment, since
> D-entailment is only about BGPs while the XSD 1.0 reference is only used in
> a section about FILTERs.
>
> I would be in favour of adopting the view that Andy has proposed, namely
> that the meaning of IRIs has been "upgraded in place" when XSD 1.1 became a
> standard. If this interpretation is not used, one would get a very weird
> conforming behaviour, where the following two queries would have the same
> answers:
>
> SELECT * WHERE {
>  ?S ?P "0000-01-01T00:00:00Z"^^xsd:dateTime .
> }
>
> SELECT * WHERE {
>  ?S ?P ?X  FILTER( ?X = xsd:dateTime("-0001-01-01T00:00:00Z") )
> }
>
> Note that this would be true even for processors that do not support
> D-entailment, as long as they support the xsd:dateTime FILTER at all.
> Clearly, this is not something we would want, so that the only sane
> interpretation of xsd:dateTime in SPARQL 1.1 would be to use XSD 1.1.
> Fortunately, this is also the interpretation that current RDF and OWL
> standards require, so that the meaning of year "0000" in the input file is
> the same as in the query. I hope others agree.
>
> Note there is a non-technical dimension to all of this: if a SPARQL endpoint
> or LOD service returns data on the web, consuming applications must know
> what it means (not to calculate durations or to check validity -- but
> already to correctly display the data to their users in a non-technical
> syntax). The point of view that XSD literals are "just strings" may work for
> a DBMS implementer, but as a user of the technology you have to decide how
> to encode and query your content, i.e., you must know how "1 BCE" is
> represented. People who are building applications based on RDF and SPARQL
> therefore must make this decision, and I can only see them going with the
> most recent XSD, RDF, and OWL standards -- it's great to know that SPARQL
> 1.1 agrees with those, even if one has to do some interpretation to
> recognize this ;-)
>
> Cheers,
>
> Markus
>
>
> [1] http://www.w3.org/TR/sparql11-entailment/#DEntRegime
> [2] http://www.w3.org/TR/sparql11-query/#operandDataTypes
>
>
>
> On 27.03.2015 12:54, Andy Seaborne wrote:
>>
>> The root change is in: ISO 8601:2000 Second Edition
>> where year "0000" went from illegal to 1 BCE.
>>
>> Yes - I can see that's a genuine problem for wikidata.
>>
>> Two answers: spec effect and implementation reality.
>>
>> 1/ Spec answer.
>>
>> For just plain retrieval of data, SPARQL returns RDFterms, not related
>> to their legality or value so it's the form that is returned,
>> "-0001-02-03T12:11:10+00:00"^^xsd:dateTime.
>>
>> Or even
>> "0000-02-03T12:11:10+00:00"^^xsd:dateTime
>>
>> If used in a FILTER, the value then does matter.
>>
>> SPARQL only formally requires xsd:dateTime, not xsd:date, and even then
>> a limited subset of oeprations; comparison but not subtraction.  Many
>> implementations include xsd:date as well.
>>
>> I can see two ways of observing the change:
>>
>> A/ If there illegal lexical forms, the year "0000" was illegal and
>> became legal, and a FILTER may go from being an error to returning true
>> or false.  This happens if the data has year "0000" or the FILTER
>> mentions it explicitly.
>>
>> A FILTER expression evaluates to an error is effectively false overall
>> anyway.
>>
>> # Different days, year 0000
>> FILTER (
>>     "0000-02-03T12:11:10+00:00"^^xsd:dateTime >=
>>     "0000-02-02T12:11:10+00:00"^^xsd:dateTime )
>>
>> changes from filter error, do not return the row, to true.
>>
>> but comparison around the boundary is not changed.  It is the mentioning
>> of 0000, explicitly or in the data, that is the problem.
>>
>> B/ As an extension, xsd:duration may be supported.
>>
>> # Across BCE/CE boundary:
>> BIND("-0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d1)
>> BIND("0001-02-03T12:11:10+00:00"^^xsd:dateTime AS ?d2)
>> BIND(?d2 - ?d1 AS ?duration)
>>
>> SPARQL refers to XSD Schema 1.0 but the effect of extensions is
>> implementation.  Functions are named by URI and because XSD 1.1 does not
>> change the URI for datatypes or functions, it's sort of an "upgrade in
>> place".
>>
>> So specification wise, there is an impact, it's confused by the
>> change-in-place of XSD URIs.
>>
>>  > What does "-0001-02-03"^^xsd:date mean?
>>
>> When that is the RDFterm returned, it's up to the application.
>> When it's used in a FILTER, it's exposed to the change.
>> Extensions to the core spec are impacted.
>>
>> 2/ Implementation answer:
>>
>> Implementations may rely on a 3rd party library to do the parsing and
>> calculation and it will whatever that library does.
>>
>> For example, Jena uses Apache Xerces for parsing and the Java runtime,
>> which provides XMLGregorianCalendar which is W3C XML Schema 1.0 (Java8
>> and Java9), for calculation of durations.
>>
>>      Andy
>>
>> On 27/03/15 09:56, Markus Kroetzsch wrote:
>>>
>>> Dear all, especially former members of the SPARQL WG,
>>>
>>> As you might know, the Wikimedia Foundation is currently working on
>>> setting up an official public SPARQL service for Wikidata. This was done
>>> not to integrate with RDF or to add to the semantic web, but simply
>>> because it seems to be the best technology for the query problem at
>>> hand. I think this should be considered a success :-) You are also
>>> welcome to play around with the preliminary test SPARQL endpoint of
>>> Wikidata, see [0], and of course to comment on the wikidata-l list
>>> regarding nice SPARQL queries or other ideas.
>>>
>>> However, on the way to making this a reality as a fully integrated
>>> feature of Wikidata/Wikipedia, there are many issues to be solved. One
>>> that came up recently is about xsd:date(Time) in SPARQL 1.1. As you will
>>> know, XML Schema has changed the semantics of its date types in
>>> incompatible ways between XSD 1.0 and XSD 1.1:
>>>
>>> * XSD 1.1: "-0001-02-03"^^xsd:date means "3rd Feb 2 BCE"  [1]
>>> * XSD 1.0: "-0001-02-03"^^xsd:date means "3rd Feb 1 BCE"  [2]
>>>
>>> Needless to say that this is a big deal in applications like Wikidata,
>>> where you have a lot of historical dates. The obvious question now is:
>>> What does "-0001-02-03"^^xsd:date mean when used in SPARQL? RDF? OWL?
>>> Here is what I have found so far:
>>>
>>> * RDF 1.0: year 1 BCE
>>> * OWL 1: year 1 BCE
>>> * SPARQL 1.0: year 1 BCE
>>> (all as expected)
>>>
>>> * RDF 1.1: year 2 BCE [3]
>>> * OWL 2: year 2 BCE [4]
>>> * SPARQL 1.1: ???
>>>
>>> It is interesting to note that the semantic changes in XSD, RDF and OWL
>>> each are breaking changes, which change the meaning of existing
>>> documents (where the document itself may not contain any hint as to
>>> whether it was created before or after the change).
>>>
>>> I am not sure what is the case for SPARQL 1.1. It seems very much
>>> preferable if SPARQL would follow the other W3C standards in this
>>> matter, but I did not find out yet what was the intention of the SPARQL
>>> WG. All comments are welcome, but in the end we are looking for a
>>> normative answer here.
>>>
>>> Best regards,
>>>
>>> Markus
>>>
>>>
>>> [0]
>>> https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg05601.html
>>> (gives
>>> you the Wikidata endpoint URL, but more importantly also example queries
>>> for our current RDF translation, which we are currently revising in
>>> several places)
>>> [1] http://www.w3.org/TR/xmlschema11-2/#dateTime
>>> [2] http://www.w3.org/TR/xmlschema-2/#dateTime
>>> [3] http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
>>> [4] http://www.w3.org/TR/owl2-syntax/#Datatype_Maps
>>>
>>
>>
>>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universit├Ąt Dresden
> +49 351 463 38486
> http://korrekt.org/
>



-- 
Jun. Prof. Dr. Birte Glimm            Tel.:    +49 731 50 24125
Inst. of Artificial Intelligence         Secr:  +49 731 50 24258
University of Ulm                         Fax:   +49 731 50 24188
D-89069 Ulm                               birte.glimm@uni-ulm.de
Germany
Received on Friday, 27 March 2015 14:03:25 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 March 2015 14:03:25 UTC