Blank Node Ordering

Hello,

We recently ran into some unexpected behaviour that we want to bring to
this groups attention regarding the ORDER BY clause.

When ordering RDF literals and URIs, the same literal or the same URI
will always be arranged together. However, there is no guarantee with
blank nodes that the same blank nodes will be arranged together.

The following SPARQL query lists all the vcards addresses in the default
graph along with their properties. A single address is represented in
multiple result bindings, one for each property in the data store.

SELECT ?card ?adr ?pred ?obj {
  ?card a vcard:VCard; vcard:adr ?adr .
  ?adr ?pred ?obj .
} ORDER BY ?vcard ?adr ?pred

The (author's) expected result is to have all results bindings ordered
first by the vcard they belong to and if there are multiple addresses on
the vcard, each address property is ordered together.

For example the follow bindings sets are a valid result set. Notice that
the entire home address comes before any of the work address properties.
This order is predictable because of the ORDER BY clause in the query
above.

vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia"
vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity"
vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555"
vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake
Drive"
vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia"
vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity"
vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555"
vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise
Drive"

However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for
the author to assume the addresses will always be ordered together like
this.

Consider the result set if blank nodes were used for the address node.
The result might look like the one below.

vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity"
vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive"
vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise
Drive"
vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia"
vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia"
vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555"
vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555"
vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity"

Although each result of a vcard is ordered together, because it is a
URI, the ordering of the adr blank nodes looks random and is
unpredictable. Sesame 2.x is implemented to appear to randomly arrange
blank node results when ordering by blank nodes as shown above. When the
data used contains blank node there is no way to control the ordering.

The author would expect that _:b1 is ordered before or after _:b2, but
the author would not expect that _:b1 is mixed among _:b2. Although,
there is no order between _:b1 and _:b2, SPARQL should provide guidance
on how to arrange blank nodes.

Many people still use blank nodes and this issue causes unexpected
results for SPARQL users.

My colleagues and I propose that the group seriously consider adding a
restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any
RDF term will guarantee that same terms are arranged together.

Although, an order among different blank nodes could not be fixed.
SPARQL should fix the same RDF terms to be ordered together.

Thanks,
James

Received on Friday, 28 October 2011 12:03:22 UTC