Re: SPARQL Query Problem - perhaps solvable in 1.1? from Lee Feigenbaum on 2009-08-25 (public-rdf-dawg@w3.org from July to September 2009)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Tue, 25 Aug 2009 08:45:57 -0400
To: Toby Inkster <tai@g5n.co.uk>
CC: public-rdf-dawg@w3.org, public-sparql-dev@w3.org
Message-ID: <4A93DD05.4080905@thefigtrees.net>
Hi Toby,

I've CCed the SPARQL WG list but also the public-sparql-dev list which 
in general is better suited for "how do I..." SPARQL questions. If I'm 
correct that that best suits the nature of this question, please drop 
public-rdf-dawg@w3.org from future messages in this thread.

If I understand you correctly, the pattern you're looking for is that 
you have a prioritized list of predicates that you want to use for a 
particular value, in this case date. The "canonical" way to do this in 
SPARQL (or, at least, what I've always done and seen done) is to use a 
series of OPTIONAL clauses that all bind to the same variable:

SELECT ?date {
    ?item a ex:Item .
    OPTIONAL { ?item dct:created ?date }
    OPTIONAL { ?item dct:issued ?date }
    OPTIONAL { ?item dct:date ?date }
    OPTIONAL { ?item dc:date ?date }
} ORDER BY ?date

This will bind ?date to the first of the predicates that have a value 
for each ex:Item.

Please let me know if I'm misunderstanding your needs.

thanks,
Lee

P.S. In general, please send feedback to the SPARQL WG to 
public-rdf-dawg-comments@w3.org, as the group uses 
public-rdf-dawg@w3.org for internal Working Group business. Thanks.

Toby Inkster wrote:
> OK, I've hit up against a problem which at first reading, sounds simple
> enough, but I'm pretty sure is unsolvable in SPARQL 1.0. I'll outline
> this problem below. Any suggestions as to how it can be solved in SPARQL
> 1.0 would be gratefully received. But if you're as convinced as I am
> that it's unsolvable, I also have a suggested feature for SPARQL 1.1
> that should solve it.
> 
> So the problem. I'm collecting a bunch of RSS feeds into a triple store
> and trying to create a single list of articles from them. I need to be
> able to filter the list by date range and order it by date. The people
> behind the RSS 1.0 spec (in their infinite wisdom) decided that a date
> property would not be needed, so RSS feeds will typically contain a
> mixed bag of different properties that could describe the publication
> date of the items. I'm focusing on the following four:
> 
>  <http://purl.org/dc/terms/created>
>  <http://purl.org/dc/terms/issued>
>  <http://purl.org/dc/terms/date>
>  <http://purl.org/dc/elements/1.1/date>
> 
> When an RSS item has a dcterms:created date, then I essentially want to
> treat that as authoritative and ignore everything else. If there is no
> dcterms:created, then I'd fall back to dcterms:issued, treating that as
> authoritative and ignoring the other two terms, and so on.
> 
> Filtering for a date range - say, all the items published in 2008 - is
> pretty tricky, but is achievable. My solution is to bind each date to a
> different variable (dcterms:created is ?date1, dcterms:issued is ?date2,
> etc) in OPTIONAL clauses and then do a filter like this:
> 
> FILTER (
>   (bound(?date1) && inRange(?date1))
>   ||
>   (!bound(?date1) && bound(?date2) && inRange(?date2))
>   ||
>   (!bound(?date1) && !bound(?date2) && bound(?date3) && inRange(?date3))
>   ||
>   (!bound(?date1) && !bound(?date2) && !bound(?date3) && bound(?date4) && inRange(?date4))
> )
> 
> where "inRange" in the above pseudo-code actually represents some
> xsd:dateTime type casts and greater-than and less-than comparisons.
> 
> This is ugly, yes, but it works. A lot of the complicatedness comes from
> the fact that the date property found first in my order of priorities
> needs to be treated as completely authoritative. So that, for example,
> if an item has dcterms:created only in 2007, then when filtering for
> items in 2008, that item will not be found, even if it has
> dcterms:issued, dcterms:date and dc:date properties all with values in
> 2008!
> 
> Anyway, as I said, this is ugly, but it works. However, results are of
> course returned in no particular order. Right now, I pull these results
> into my application and sort them into date order there. But I'd like
> the SPARQL query engine to take care of the sorting itself - in
> particular, that way I'd be able to get the query engine to apply any
> LIMIT and OFFSET I wanted, saving a lot of communications overhead
> between the query engine and the application in the case where, say,
> there are 500 matching items but I only want the first 10 ordered by
> date.
> 
> Michael Hausenblas on #swig suggested a solution which at first glance
> looks like it might work, and looks really easy:
> 
>  ORDER BY (?date1 || ?date2 || ?date3 || ?date4)
> 
> However, this doesn't work, as the || operator seems to always return an
> xsd:boolean - not (as is the case in many other programming languages)
> the first non-false literal value that was passed to it.
> 
> I'm fairly convinced that ordering my results is not possible without
> breaking my filter.
> 
> If you need some test data to play with, I've put some here:
> 
> http://buzzword.org.uk/2009/sparql-test-data-1.ttl
> 
> The expected results should be :inRange4, :inRange3, :inRange2
> and :inRange1 - in that order.
> 
> A very simple solution, which would solve the ordering problem (and also
> greatly simplify the filter) would be for SPARQL to borrow the COALESCE
> function from SQL. For those not familiar with COALESCE, it takes a
> variable number of arguments, and returns the first of those arguments
> which is not null. (In the SPARQL case, it would be the first which is
> bound.)
> 
> That would make my filter as simple as:
> 
>  FILTER (inRange(COALESCE(?date1,?date2,?date3,?date4)))
> 
> And my sorting as easy as:
> 
>  ORDER BY (COALESCE(?date1,?date2,?date3,?date4))
> 
> Failing that, even an if-then-else tertiary operator would be useful:
> 
>  ORDER BY (
>    if bound(?date1)
>    then ?date1
>    else (
>      if bound(?date2) ...
>    )
>  )
>
Received on Tuesday, 25 August 2009 12:46:47 UTC