Re: Blank Node Ordering from Andy Seaborne on 2011-11-02 (public-rdf-dawg-comments@w3.org from November 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 02 Nov 2011 10:55:32 +0000
To: public-rdf-dawg-comments@w3.org
CC: James Leigh <james@leighnet.ca>
Message-ID: <4EB121A4.4040000@epimorphics.com>
James,

The RDF Working Group is defining how blank nodes can be skolemized by a 
store. This can be used (internally) as a way to order blank nodes.

In addition, SPARQL 1.1 does provide a way to group related items, blank 
nodes included, together be using GROUP BY.


   SELECT ?card ?adr ?pred ?obj {
     ?card a vcard:VCard; vcard:adr ?adr .
     ?adr ?pred ?obj .
   } GROUP BY ?vcard ?adr ?pred ?obj

While this clusters RDF terms that are the same, it does not sort them.

Implementations are also free to provide extensions to "<" or ORDER BY 
to provide placing rows with the same blank node together in the sort order.

The working group did not choose to make changes in this area of the 
specification during the "Features and Requirements" phase and the 
working group charter discourages any change that would alter the 
results of a query that was valid according to SPARQL 1.0.

Therefore the working group has decided not to make a change in this area.

We would be grateful if you would acknowledge that your comment has been 
answered by sending a reply to this mailing list.

Andy, On behalf of the SPARQL WG


On 27/10/11 15:05, James Leigh wrote:
> Hello,
>
> We recently ran into some unexpected behaviour that we want to bring to
> this groups attention regarding the ORDER BY clause.
>
> When ordering RDF literals and URIs, the same literal or the same URI
> will always be arranged together. However, there is no guarantee with
> blank nodes that the same blank nodes will be arranged together.
>
> The following SPARQL query lists all the vcards addresses in the default
> graph along with their properties. A single address is represented in
> multiple result bindings, one for each property in the data store.
>
> SELECT ?card ?adr ?pred ?obj {
>    ?card a vcard:VCard; vcard:adr ?adr .
>    ?adr ?pred ?obj .
> } ORDER BY ?vcard ?adr ?pred
>
> The (author's) expected result is to have all results bindings ordered
> first by the vcard they belong to and if there are multiple addresses on
> the vcard, each address property is ordered together.
>
> For example the follow bindings sets are a valid result set. Notice that
> the entire home address comes before any of the work address properties.
> This order is predictable because of the ORDER BY clause in the query
> above.
>
> vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia"
> vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity"
> vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555"
> vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake
> Drive"
> vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia"
> vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity"
> vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555"
> vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise
> Drive"
>
> However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for
> the author to assume the addresses will always be ordered together like
> this.
>
> Consider the result set if blank nodes were used for the address node.
> The result might look like the one below.
>
> vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity"
> vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive"
> vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise
> Drive"
> vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia"
> vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia"
> vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555"
> vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555"
> vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity"
>
> Although each result of a vcard is ordered together, because it is a
> URI, the ordering of the adr blank nodes looks random and is
> unpredictable. Sesame 2.x is implemented to appear to randomly arrange
> blank node results when ordering by blank nodes as shown above. When the
> data used contains blank node there is no way to control the ordering.
>
> The author would expect that _:b1 is ordered before or after _:b2, but
> the author would not expect that _:b1 is mixed among _:b2. Although,
> there is no order between _:b1 and _:b2, SPARQL should provide guidance
> on how to arrange blank nodes.
>
> Many people still use blank nodes and this issue causes unexpected
> results for SPARQL users.
>
> My colleagues and I propose that the group seriously consider adding a
> restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any
> RDF term will guarantee that same terms are arranged together.
>
> Although, an order among different blank nodes could not be fixed.
> SPARQL should fix the same RDF terms to be ordered together.
>
> Thanks,
> James
>
>
>
Received on Wednesday, 2 November 2011 10:56:18 UTC