- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Tue, 25 Aug 2009 08:45:57 -0400
- To: Toby Inkster <tai@g5n.co.uk>
- CC: public-rdf-dawg@w3.org, public-sparql-dev@w3.org
Hi Toby, I've CCed the SPARQL WG list but also the public-sparql-dev list which in general is better suited for "how do I..." SPARQL questions. If I'm correct that that best suits the nature of this question, please drop public-rdf-dawg@w3.org from future messages in this thread. If I understand you correctly, the pattern you're looking for is that you have a prioritized list of predicates that you want to use for a particular value, in this case date. The "canonical" way to do this in SPARQL (or, at least, what I've always done and seen done) is to use a series of OPTIONAL clauses that all bind to the same variable: SELECT ?date { ?item a ex:Item . OPTIONAL { ?item dct:created ?date } OPTIONAL { ?item dct:issued ?date } OPTIONAL { ?item dct:date ?date } OPTIONAL { ?item dc:date ?date } } ORDER BY ?date This will bind ?date to the first of the predicates that have a value for each ex:Item. Please let me know if I'm misunderstanding your needs. thanks, Lee P.S. In general, please send feedback to the SPARQL WG to public-rdf-dawg-comments@w3.org, as the group uses public-rdf-dawg@w3.org for internal Working Group business. Thanks. Toby Inkster wrote: > OK, I've hit up against a problem which at first reading, sounds simple > enough, but I'm pretty sure is unsolvable in SPARQL 1.0. I'll outline > this problem below. Any suggestions as to how it can be solved in SPARQL > 1.0 would be gratefully received. But if you're as convinced as I am > that it's unsolvable, I also have a suggested feature for SPARQL 1.1 > that should solve it. > > So the problem. I'm collecting a bunch of RSS feeds into a triple store > and trying to create a single list of articles from them. I need to be > able to filter the list by date range and order it by date. The people > behind the RSS 1.0 spec (in their infinite wisdom) decided that a date > property would not be needed, so RSS feeds will typically contain a > mixed bag of different properties that could describe the publication > date of the items. I'm focusing on the following four: > > <http://purl.org/dc/terms/created> > <http://purl.org/dc/terms/issued> > <http://purl.org/dc/terms/date> > <http://purl.org/dc/elements/1.1/date> > > When an RSS item has a dcterms:created date, then I essentially want to > treat that as authoritative and ignore everything else. If there is no > dcterms:created, then I'd fall back to dcterms:issued, treating that as > authoritative and ignoring the other two terms, and so on. > > Filtering for a date range - say, all the items published in 2008 - is > pretty tricky, but is achievable. My solution is to bind each date to a > different variable (dcterms:created is ?date1, dcterms:issued is ?date2, > etc) in OPTIONAL clauses and then do a filter like this: > > FILTER ( > (bound(?date1) && inRange(?date1)) > || > (!bound(?date1) && bound(?date2) && inRange(?date2)) > || > (!bound(?date1) && !bound(?date2) && bound(?date3) && inRange(?date3)) > || > (!bound(?date1) && !bound(?date2) && !bound(?date3) && bound(?date4) && inRange(?date4)) > ) > > where "inRange" in the above pseudo-code actually represents some > xsd:dateTime type casts and greater-than and less-than comparisons. > > This is ugly, yes, but it works. A lot of the complicatedness comes from > the fact that the date property found first in my order of priorities > needs to be treated as completely authoritative. So that, for example, > if an item has dcterms:created only in 2007, then when filtering for > items in 2008, that item will not be found, even if it has > dcterms:issued, dcterms:date and dc:date properties all with values in > 2008! > > Anyway, as I said, this is ugly, but it works. However, results are of > course returned in no particular order. Right now, I pull these results > into my application and sort them into date order there. But I'd like > the SPARQL query engine to take care of the sorting itself - in > particular, that way I'd be able to get the query engine to apply any > LIMIT and OFFSET I wanted, saving a lot of communications overhead > between the query engine and the application in the case where, say, > there are 500 matching items but I only want the first 10 ordered by > date. > > Michael Hausenblas on #swig suggested a solution which at first glance > looks like it might work, and looks really easy: > > ORDER BY (?date1 || ?date2 || ?date3 || ?date4) > > However, this doesn't work, as the || operator seems to always return an > xsd:boolean - not (as is the case in many other programming languages) > the first non-false literal value that was passed to it. > > I'm fairly convinced that ordering my results is not possible without > breaking my filter. > > If you need some test data to play with, I've put some here: > > http://buzzword.org.uk/2009/sparql-test-data-1.ttl > > The expected results should be :inRange4, :inRange3, :inRange2 > and :inRange1 - in that order. > > A very simple solution, which would solve the ordering problem (and also > greatly simplify the filter) would be for SPARQL to borrow the COALESCE > function from SQL. For those not familiar with COALESCE, it takes a > variable number of arguments, and returns the first of those arguments > which is not null. (In the SPARQL case, it would be the first which is > bound.) > > That would make my filter as simple as: > > FILTER (inRange(COALESCE(?date1,?date2,?date3,?date4))) > > And my sorting as easy as: > > ORDER BY (COALESCE(?date1,?date2,?date3,?date4)) > > Failing that, even an if-then-else tertiary operator would be useful: > > ORDER BY ( > if bound(?date1) > then ?date1 > else ( > if bound(?date2) ... > ) > ) >
Received on Tuesday, 25 August 2009 12:46:45 UTC