Re: SPARQL Query Problem - perhaps solvable in 1.1?

Hi Toby,

I've CCed the SPARQL WG list but also the public-sparql-dev list which 
in general is better suited for "how do I..." SPARQL questions. If I'm 
correct that that best suits the nature of this question, please drop from future messages in this thread.

If I understand you correctly, the pattern you're looking for is that 
you have a prioritized list of predicates that you want to use for a 
particular value, in this case date. The "canonical" way to do this in 
SPARQL (or, at least, what I've always done and seen done) is to use a 
series of OPTIONAL clauses that all bind to the same variable:

SELECT ?date {
    ?item a ex:Item .
    OPTIONAL { ?item dct:created ?date }
    OPTIONAL { ?item dct:issued ?date }
    OPTIONAL { ?item dct:date ?date }
    OPTIONAL { ?item dc:date ?date }
} ORDER BY ?date

This will bind ?date to the first of the predicates that have a value 
for each ex:Item.

Please let me know if I'm misunderstanding your needs.


P.S. In general, please send feedback to the SPARQL WG to, as the group uses for internal Working Group business. Thanks.

Toby Inkster wrote:
> OK, I've hit up against a problem which at first reading, sounds simple
> enough, but I'm pretty sure is unsolvable in SPARQL 1.0. I'll outline
> this problem below. Any suggestions as to how it can be solved in SPARQL
> 1.0 would be gratefully received. But if you're as convinced as I am
> that it's unsolvable, I also have a suggested feature for SPARQL 1.1
> that should solve it.
> So the problem. I'm collecting a bunch of RSS feeds into a triple store
> and trying to create a single list of articles from them. I need to be
> able to filter the list by date range and order it by date. The people
> behind the RSS 1.0 spec (in their infinite wisdom) decided that a date
> property would not be needed, so RSS feeds will typically contain a
> mixed bag of different properties that could describe the publication
> date of the items. I'm focusing on the following four:
>  <>
>  <>
>  <>
>  <>
> When an RSS item has a dcterms:created date, then I essentially want to
> treat that as authoritative and ignore everything else. If there is no
> dcterms:created, then I'd fall back to dcterms:issued, treating that as
> authoritative and ignoring the other two terms, and so on.
> Filtering for a date range - say, all the items published in 2008 - is
> pretty tricky, but is achievable. My solution is to bind each date to a
> different variable (dcterms:created is ?date1, dcterms:issued is ?date2,
> etc) in OPTIONAL clauses and then do a filter like this:
>   (bound(?date1) && inRange(?date1))
>   ||
>   (!bound(?date1) && bound(?date2) && inRange(?date2))
>   ||
>   (!bound(?date1) && !bound(?date2) && bound(?date3) && inRange(?date3))
>   ||
>   (!bound(?date1) && !bound(?date2) && !bound(?date3) && bound(?date4) && inRange(?date4))
> )
> where "inRange" in the above pseudo-code actually represents some
> xsd:dateTime type casts and greater-than and less-than comparisons.
> This is ugly, yes, but it works. A lot of the complicatedness comes from
> the fact that the date property found first in my order of priorities
> needs to be treated as completely authoritative. So that, for example,
> if an item has dcterms:created only in 2007, then when filtering for
> items in 2008, that item will not be found, even if it has
> dcterms:issued, dcterms:date and dc:date properties all with values in
> 2008!
> Anyway, as I said, this is ugly, but it works. However, results are of
> course returned in no particular order. Right now, I pull these results
> into my application and sort them into date order there. But I'd like
> the SPARQL query engine to take care of the sorting itself - in
> particular, that way I'd be able to get the query engine to apply any
> LIMIT and OFFSET I wanted, saving a lot of communications overhead
> between the query engine and the application in the case where, say,
> there are 500 matching items but I only want the first 10 ordered by
> date.
> Michael Hausenblas on #swig suggested a solution which at first glance
> looks like it might work, and looks really easy:
>  ORDER BY (?date1 || ?date2 || ?date3 || ?date4)
> However, this doesn't work, as the || operator seems to always return an
> xsd:boolean - not (as is the case in many other programming languages)
> the first non-false literal value that was passed to it.
> I'm fairly convinced that ordering my results is not possible without
> breaking my filter.
> If you need some test data to play with, I've put some here:
> The expected results should be :inRange4, :inRange3, :inRange2
> and :inRange1 - in that order.
> A very simple solution, which would solve the ordering problem (and also
> greatly simplify the filter) would be for SPARQL to borrow the COALESCE
> function from SQL. For those not familiar with COALESCE, it takes a
> variable number of arguments, and returns the first of those arguments
> which is not null. (In the SPARQL case, it would be the first which is
> bound.)
> That would make my filter as simple as:
>  FILTER (inRange(COALESCE(?date1,?date2,?date3,?date4)))
> And my sorting as easy as:
>  ORDER BY (COALESCE(?date1,?date2,?date3,?date4))
> Failing that, even an if-then-else tertiary operator would be useful:
>    if bound(?date1)
>    then ?date1
>    else (
>      if bound(?date2) ...
>    )
>  )

Received on Tuesday, 25 August 2009 12:46:45 UTC