Re: Blank Node Ordering from Steve Harris on 2011-10-28 (public-rdf-wg@w3.org from October 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 28 Oct 2011 09:35:27 +0100
To: David Wood <david@3roundstones.com>
Cc: public-rdf-wg WG <public-rdf-wg@w3.org>, James Leigh <james@3roundstones.com>
Message-Id: <26DF52F1-811C-47B7-8A61-1ACE5BDCBE14@garlik.com>
It's really an inevitable consequence of the (silly IMHO) way in which blank nodes are defined in RDF, how can you define a stable ordering on existential variables?

However bNode skolemisation is one solution to this issue, as it provides a stable URI identifier for bNodes which has an order defined by http://www.w3.org/TR/rdf-sparql-query/#modOrderBy

Incidentally, I have (mostly) implemented bNode skolemistion in 4store, it was about a days work so far, but quite complex. Also the skolem constant URIs are pretty unwieldy, compared to the non-standard hack we were using before, but I think it's a worthwhile cost for interoperability and safety.

A typical 4store skolem URI looks like
http://4store.org/.well-known/genid/0F1BAE7E-B38C-4556-813E-342B60693BD0/10420f0000000041

It could be made shorter using e.g. base64 encoding, rather than hex, but the UUID in the skolem constant is the UUID of the store, which is externalised in other ways.

Has any consensus been arrived at on how to signal that some URI is a skolemised bNode in RDF yet? e.g. a class that skolem constant URIs belong to? 

- Steve

On 2011-10-27, at 22:39, David Wood wrote:

> Hi all,
> 
> FYI.  This is a real-world use case worth considering as we discuss blank nodes.  Although it is mostly a SPARQL issue, I felt this group should be aware of the discussion.
> 
> Regards,
> Dave
> 
> 
> 
> 
> Begin forwarded message:
> 
>> From: James Leigh <james@3roundstones.com>
>> Subject: Blank Node Ordering
>> Date: October 27, 2011 10:05:30 EDT
>> To: public-rdf-dawg-comments@w3.org
>> Cc: David Wood <david@3roundstones.com>
>> 
>> Hello,
>> 
>> We recently ran into some unexpected behaviour that we want to bring to
>> this groups attention regarding the ORDER BY clause.
>> 
>> When ordering RDF literals and URIs, the same literal or the same URI
>> will always be arranged together. However, there is no guarantee with
>> blank nodes that the same blank nodes will be arranged together.
>> 
>> The following SPARQL query lists all the vcards addresses in the default
>> graph along with their properties. A single address is represented in
>> multiple result bindings, one for each property in the data store.
>> 
>> SELECT ?card ?adr ?pred ?obj {
>>  ?card a vcard:VCard; vcard:adr ?adr .
>>  ?adr ?pred ?obj .
>> } ORDER BY ?vcard ?adr ?pred
>> 
>> The (author's) expected result is to have all results bindings ordered
>> first by the vcard they belong to and if there are multiple addresses on
>> the vcard, each address property is ordered together.
>> 
>> For example the follow bindings sets are a valid result set. Notice that
>> the entire home address comes before any of the work address properties.
>> This order is predictable because of the ORDER BY clause in the query
>> above.
>> 
>> vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia"
>> vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity"
>> vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555"
>> vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake
>> Drive"
>> vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia"
>> vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity"
>> vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555"
>> vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise
>> Drive"
>> 
>> However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for
>> the author to assume the addresses will always be ordered together like
>> this.
>> 
>> Consider the result set if blank nodes were used for the address node.
>> The result might look like the one below.
>> 
>> vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity"
>> vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive"
>> vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise
>> Drive"
>> vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia"
>> vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia"
>> vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555"
>> vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555"
>> vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity"
>> 
>> Although each result of a vcard is ordered together, because it is a
>> URI, the ordering of the adr blank nodes looks random and is
>> unpredictable. Sesame 2.x is implemented to appear to randomly arrange
>> blank node results when ordering by blank nodes as shown above. When the
>> data used contains blank node there is no way to control the ordering.
>> 
>> The author would expect that _:b1 is ordered before or after _:b2, but
>> the author would not expect that _:b1 is mixed among _:b2. Although,
>> there is no order between _:b1 and _:b2, SPARQL should provide guidance
>> on how to arrange blank nodes.
>> 
>> Many people still use blank nodes and this issue causes unexpected
>> results for SPARQL users.
>> 
>> My colleagues and I propose that the group seriously consider adding a
>> restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any
>> RDF term will guarantee that same terms are arranged together.
>> 
>> Although, an order among different blank nodes could not be fixed.
>> SPARQL should fix the same RDF terms to be ordered together.
>> 
>> Thanks,
>> James
>> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Friday, 28 October 2011 08:36:01 UTC