Re: Blank Node Ordering from Andy Seaborne on 2011-10-28 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 28 Oct 2011 10:48:17 +0100
To: public-rdf-wg@w3.org
Message-ID: <4EAA7A61.6000200@epimorphics.com>
Skolemization is certainly one way to solve the problem.  It's a 
stronger condition than the use case is asking for (clustering within a 
single result set) but it's a fairly likely next request to have bNode 
references that can be used in a subsequent query (e.g. RDF lists) or 
that the sorting is the same for two separate queries.

Other solutions:

1/ Use GROUP BY ?vcard ?adr ?pred ?obj
    This clusters but does not sort.  It is legal, strict SPARQL 1.1

2/ Implementations may extend "<" to define a stable ordering

3/ Implementations may extend ORDER BY to define a stable ordering

4/ (SPARQL 1.1) extend URI or STR to return something that labels the 
bNodes.

Apache Jena provides stable ordering of sorted results (2) - any ORDER 
BY is stable within and across requests.  Bnodes are ordered using the 
internal identifier.

Jena also provides (4). (4) isn't the skolemization scheme yet RDF-WG is 
proposing - the extension pre-dates this WG - and it can be used to 
round-trip bNode references.

(Steve - it's good to hear that the design works for 4Store).

The fact it changes Sesame actually makes it harder for a SPARQL change. 
  The SPARQL-WG charter is strongly worded against making changes that 
alter SPARQL 1.0 queries.  Has there been discussion on the Sesame lists 
as to a change here?

What we-all must be aware of is slipping into defining "subsets of RDF". 
  Skolemization means that there is one approach, for SPARQL and for API 
use.

	Andy

On 28/10/11 09:35, Steve Harris wrote:
> It's really an inevitable consequence of the (silly IMHO) way in which
> blank nodes are defined in RDF, how can you define a stable ordering on
> existential variables?
>
> However bNode skolemisation is one solution to this issue, as it
> provides a stable URI identifier for bNodes which has an order defined
> by http://www.w3.org/TR/rdf-sparql-query/#modOrderBy
>
> Incidentally, I have (mostly) implemented bNode skolemistion in 4store,
> it was about a days work so far, but quite complex. Also the skolem
> constant URIs are pretty unwieldy, compared to the non-standard hack we
> were using before, but I think it's a worthwhile cost for
> interoperability and safety.
>
> A typical 4store skolem URI looks like
> http://4store.org/.well-known/genid/0F1BAE7E-B38C-4556-813E-342B60693BD0/10420f0000000041
>
> It could be made shorter using e.g. base64 encoding, rather than hex,
> but the UUID in the skolem constant is the UUID of the store, which is
> externalised in other ways.
>
> Has any consensus been arrived at on how to signal that some URI is a
> skolemised bNode in RDF yet? e.g. a class that skolem constant URIs
> belong to?
>
> - Steve
>
> On 2011-10-27, at 22:39, David Wood wrote:
>
>> Hi all,
>>
>> FYI. This is a real-world use case worth considering as we discuss
>> blank nodes. Although it is mostly a SPARQL issue, I felt this group
>> should be aware of the discussion.
>>
>> Regards,
>> Dave
>>
>>
>>
>>
>> Begin forwarded message:
>>
>>> *From: *James Leigh <james@3roundstones.com
>>> <mailto:james@3roundstones.com>>
>>> *Subject: **Blank Node Ordering*
>>> *Date: *October 27, 2011 10:05:30 EDT
>>> *To: *public-rdf-dawg-comments@w3.org
>>> <mailto:public-rdf-dawg-comments@w3.org>
>>> *Cc: *David Wood <david@3roundstones.com <mailto:david@3roundstones.com>>
>>>
>>> Hello,
>>>
>>> We recently ran into some unexpected behaviour that we want to bring to
>>> this groups attention regarding the ORDER BY clause.
>>>
>>> When ordering RDF literals and URIs, the same literal or the same URI
>>> will always be arranged together. However, there is no guarantee with
>>> blank nodes that the same blank nodes will be arranged together.
>>>
>>> The following SPARQL query lists all the vcards addresses in the default
>>> graph along with their properties. A single address is represented in
>>> multiple result bindings, one for each property in the data store.
>>>
>>> SELECT ?card ?adr ?pred ?obj {
>>> ?card a vcard:VCard; vcard:adr ?adr .
>>> ?adr ?pred ?obj .
>>> } ORDER BY ?vcard ?adr ?pred
>>>
>>> The (author's) expected result is to have all results bindings ordered
>>> first by the vcard they belong to and if there are multiple addresses on
>>> the vcard, each address property is ordered together.
>>>
>>> For example the follow bindings sets are a valid result set. Notice that
>>> the entire home address comes before any of the work address properties.
>>> This order is predictable because of the ORDER BY clause in the query
>>> above.
>>>
>>> vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia"
>>> vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity"
>>> vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555"
>>> vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake
>>> Drive"
>>> vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia"
>>> vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity"
>>> vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555"
>>> vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise
>>> Drive"
>>>
>>> However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for
>>> the author to assume the addresses will always be ordered together like
>>> this.
>>>
>>> Consider the result set if blank nodes were used for the address node.
>>> The result might look like the one below.
>>>
>>> vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity"
>>> vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive"
>>> vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise
>>> Drive"
>>> vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia"
>>> vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia"
>>> vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555"
>>> vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555"
>>> vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity"
>>>
>>> Although each result of a vcard is ordered together, because it is a
>>> URI, the ordering of the adr blank nodes looks random and is
>>> unpredictable. Sesame 2.x is implemented to appear to randomly arrange
>>> blank node results when ordering by blank nodes as shown above. When the
>>> data used contains blank node there is no way to control the ordering.
>>>
>>> The author would expect that _:b1 is ordered before or after _:b2, but
>>> the author would not expect that _:b1 is mixed among _:b2. Although,
>>> there is no order between _:b1 and _:b2, SPARQL should provide guidance
>>> on how to arrange blank nodes.
>>>
>>> Many people still use blank nodes and this issue causes unexpected
>>> results for SPARQL users.
>>>
>>> My colleagues and I propose that the group seriously consider adding a
>>> restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any
>>> RDF term will guarantee that same terms are arranged together.
>>>
>>> Although, an order among different blank nodes could not be fixed.
>>> SPARQL should fix the same RDF terms to be ordered together.
>>>
>>> Thanks,
>>> James
>>>
>>
>
> --
> Steve Harris, CTO, Garlik Limited
> 1-3 Halford Road, Richmond, TW10 6AW, UK
> +44 20 8439 8203 http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
Received on Friday, 28 October 2011 09:48:53 UTC