- From: James Leigh <james@3roundstones.com>
- Date: Thu, 27 Oct 2011 10:05:30 -0400
- To: public-rdf-dawg-comments@w3.org
- Cc: David Wood <david@3roundstones.com>
Hello, We recently ran into some unexpected behaviour that we want to bring to this groups attention regarding the ORDER BY clause. When ordering RDF literals and URIs, the same literal or the same URI will always be arranged together. However, there is no guarantee with blank nodes that the same blank nodes will be arranged together. The following SPARQL query lists all the vcards addresses in the default graph along with their properties. A single address is represented in multiple result bindings, one for each property in the data store. SELECT ?card ?adr ?pred ?obj { ?card a vcard:VCard; vcard:adr ?adr . ?adr ?pred ?obj . } ORDER BY ?vcard ?adr ?pred The (author's) expected result is to have all results bindings ordered first by the vcard they belong to and if there are multiple addresses on the vcard, each address property is ordered together. For example the follow bindings sets are a valid result set. Notice that the entire home address comes before any of the work address properties. This order is predictable because of the ORDER BY clause in the query above. vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia" vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity" vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555" vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake Drive" vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia" vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity" vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555" vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise Drive" However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for the author to assume the addresses will always be ordered together like this. Consider the result set if blank nodes were used for the address node. The result might look like the one below. vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity" vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive" vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise Drive" vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia" vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia" vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555" vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555" vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity" Although each result of a vcard is ordered together, because it is a URI, the ordering of the adr blank nodes looks random and is unpredictable. Sesame 2.x is implemented to appear to randomly arrange blank node results when ordering by blank nodes as shown above. When the data used contains blank node there is no way to control the ordering. The author would expect that _:b1 is ordered before or after _:b2, but the author would not expect that _:b1 is mixed among _:b2. Although, there is no order between _:b1 and _:b2, SPARQL should provide guidance on how to arrange blank nodes. Many people still use blank nodes and this issue causes unexpected results for SPARQL users. My colleagues and I propose that the group seriously consider adding a restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any RDF term will guarantee that same terms are arranged together. Although, an order among different blank nodes could not be fixed. SPARQL should fix the same RDF terms to be ordered together. Thanks, James
Received on Friday, 28 October 2011 12:03:22 UTC