Re: Blank Nodes and SPARQL from Seaborne, Andy on 2005-07-03 (public-rdf-dawg-comments@w3.org from July 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Sun, 03 Jul 2005 20:31:20 +0100
To: Ron Alford <ronwalf@umd.edu>
Cc: Eric Prud'hommeaux <eric@w3.org>, Dan Connolly <connolly@w3.org>, public-rdf-dawg-comments@w3.org, Amy Alford <aloomis@glue.umd.edu>
Message-ID: <42C83D08.3050907@hp.com>
Ron Alford wrote:
 > Eric Prud'hommeaux wrote:
 >
 >>To figure this out, I'd like to see a use case where you extend SPARQL
 >>to use session-persistent bNodes. Then I'd like to see if a client
 >>that doesn't know of this extension will get different answers than it
 >>expects.
 >
 >
 > How about this use case:
 >
 > Ron, a horrible kitchen danger, is looking up recipes for buttermilk
 > cornbread.  Ron needs to retreive the order to add the ingredients
 > (represented as a list) so he doesn't add the baking soda straight into
 > the buttermilk.
 >
 > Now for the split.
 >
 > Assume bnodes are no longer being used as sugar for variables.  Suppose
 > they explicitly match nothing when used over the normal sparql protocol.
 > Or it could be unspecified.  At any rate, it means that straight sparql
 > protocol clients would not use bnodes.

Wouldn't both of these options would lead to incompatible usages of SPARQL?

"unspecified" would lead to different answers at different servers.

"match nothing" would mean that any different use also gets different answers.

 > Assume we have session support where labeled bnodes in the query can
 > match bnodes in the store.  Either that, or we're using a local store
 > with the same bnode matching ability (this might match up with
 > http://www.w3.org/TR/rdf-dawg-uc/#r3.5  ).
 >
 > In this case, Ron requests a session from the server.
 > He queries the server for the recipe data, including the head of the
 > list, using a hand written query [1] with some values plugged in
 >
 > If Ron doesn't get back the whole list of instructions, he takes his
 > generic list query template [2] and fills in the last part of the list
 > he received.  Rinse and repeat until all the instructions are downloaded.
 >
 >
 > Option 2) Either we don't have session support in our client, or bnodes
 > are still sugar for variables.
 >
 > Ron queries the server for the basic recipe data.
 > Ron notes that there is a bnode response for :steps, and uses the
 > previous query as a context for the list query, binding what things he
 > can to make the query smaller [3].
 >
 > Now repeat, using the previous query as the context for the next one
 > [4].  Before, binding parts of the query significantly reduced the size.
 >  Now we're starting to build up baggage.

The data provider could have chosen to provide a URI to a graph node.  By
using a blank node, they are stopping clients directly addressing that node in
the graph.  Maybe there is a reason for that.  There seems to be a tension
between publisher and consumer of the data here.  Why did the data publishers
choose that data model over, say an rdf:Seq?

 >
 >
 > Ok, we got a handle on this.  So move it to the general case.  We've got
 > complicated, hetrogenous bnode structures used for OWL complex classes,
 > Rule languages, and funny looking FOAF files.  Are we starting to feel
 > the pain yet?

The working group has decide to postpone the issue of accessing RDF collections:

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jun/0016.html

One of the reasons is because there are non-query ways of addressing the
matter.  FOAF's approach is inverse functional properties; inference may also
be used.

 >
 > So to implement this in an automated way on the client side, we have to
 > either:
 > a) Preclude hand written queries.  Generate queries using an api,
 > keeping track of the contexts which generate bnode responses.
 > b) Include query parsing and analysis on the client side, so that
 > queries can be rewritten to include the contexts without stomping on
 > anything.
 > c) <Insert your idea here>

Thank you, I will :-)

There are 3 classes of approach I can think of:  following through your point
about local query, applying it to the protocol; employing existing SPARQL
extensibility points, and making the graph bnodes addressable in the session.
They all have many variations.

==== Protocol

RDQL/Jena has had for some while the ability to pass in values for variables
at the start of query execution.  One use of this is to pass in programming
language level objects, include bNodes, so that the all solutions of the query
have that a fixed value for a variable.  It's a mechanism akin to SQL client
templates but done by naming, not position.

This can be extended to the SPARQL protocol:

        ?query=SELECT...&varX=bNode:xyz&...

Use
   SELECT ?item ?tail WHERE { ?x rdf:first ?item ; rdf:rest ?tail }
which becomes at the server:
   SELECT ?item ?tail WHERE { <bnode(xyz)> rdf:first ?item ; rdf:rest ?tail }


==== SPARQL Extensibility

SPARQL has two extension points: value functions and DESCRIBE.

== SPARQL Function Extension

(idea from Steve Harris)

Have a custom function that tests the bNode label.  This isn't covered by the
SPARQL value model - it's using the function extension point as a tunnel
between client and server inside the SPARQL syntax.

      FILTER ext:bNodeLabel(?x, "label")

SELECT ?item ?tail WHERE { ?x rdf:first ?item ;
                                rdf:rest ?tail .
                             FILTER ext:bNodeLabel(?x, "xyz") }

== DESCRIBE

Accessing list elements one by one isn't nice if the list is of any size so
get it all at once.  Your use case is about a description of the whole recipe
- this could be the CDB (Concise Bounded Description) of the thing and other
similar schemes for the information provider to give an answer that the client
can't completely determine.  In SPARQL, the DESCRIBE result form provides a
hook for this.  It enables the server to return the whole recipe in a single call.

CDB can be found at http://sw.nokia.com/uriqa/CBD.html


==== Make nodes addressible

== Dynamically assign identifiers

So as to facilitate assigning identification to bNodes, yet not modify the
target graph, the SPARQL processor could assign a URI to replace the bNode on
the way out, and map it back on the way in.  Given your ontology editing
scenario this might be especially helpful.  Suitable URI schemes would include
tag: and urn:uuid: because there is no direct resolution in these schemes.

This one is using the fact that the client wishes to reference specific parts
of the graph so the system allocates a suitable web reference.  This may
encode the bNode label into the URI or it may depend on a mapping held in the
server.

Some may not like automatically assigning URIs to replace the bNodes.  True.
But you want to reference the blank nodes by their identity.  Exposing the
labels is no different.

== Split the label space of bNodes

Use a different prefix to identify the two spaces of bNodes.

_:a for ones that are query bNodes and
_!:xyz for ones in the target graph.

Pick marker characters to your heart's content.

A variation is to in the space of labels: _:!xyz

This a bit like syntax support for the dynamically assigned identifiers.

-----

Of these, the protocol approach would appear to fit with your session paradigm
best.  I've used the the local version for sometime.

	Andy

 >
 >
 >
 > This seems like quite a burden to implement on each client.
 >
 >
 >
 > [0] Basic data:
 > PREFIX : <http://example.com/food#>.
 > PREFIX dc: <http://purl.org/dc/elements/1.1/>.
 >
 > :CornBread a :Recipe;
 >   dc:description "A simple corn bread recipe";
 >   :steps ( :AddMix, :Shake, :Bake, :FingerCheckForDoneness, :VisitER,
 >            :ReturnHome, :EatStaleHalfCookedCornBread ).
 >
 > [1]
 > PREFIX food <http://example.com/food#>
 > PREFIX dc <http://purl.org/dc/elements/1.1/>
 > SELECT ?recipe ?description ?step
 > WHERE {
 >   ?recipe a food:Recipe;
 >      dc:description ?description;
 >      food:steps ?steps.
 > }
 >
 >
 > [2]
 > PREFIX rdf <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 > SELECT ?item1 ?item2 ?item3 ?item4 ?tail
 > WHERE {
 >   %HEAD rdf:first ?item1;
 >         rdf:next ?list2 .
 >   OPTIONAL {
 >     ?list2 rdf:first ?item2;
 >            rdf:next ?list3 .
 >     OPTIONAL {
 >       ?list3 rdf:first ?item3;
 >              rdf:next ?list4 .
 >       OPTIONAL {
 >           ?list4 rdf:first ?item4;
 >                  rdf:next ?tail
 >       }
 >     }
 >   }
 > }
 >
 >
 > [3]
 > PREFIX food <http://example.com/food#>
 > PREFIX dc <http://purl.org/dc/elements/1.1/>
 > SELECT ?item1 ?item2 ?item3 ?item4 ?tail
 > WHERE {
 >   :CornBread food:steps ?steps.
 >   ?steps rdf:first ?item1;
 >          rdf:next ?list2 .
 >   OPTIONAL {
 >     ?list2 rdf:first ?item2;
 >            rdf:next ?list3 .
 >     OPTIONAL {
 >       ?list3 rdf:first ?item3;
 >              rdf:next ?list4 .
 >       OPTIONAL {
 >           ?list4 rdf:first ?item4;
 >                  rdf:next ?tail
 >       }
 >     }
 >   }
 >
 > }
 >
 > [3]
 > PREFIX food <http://example.com/food#>
 > PREFIX dc <http://purl.org/dc/elements/1.1/>
 > SELECT ?item1 ?item2 ?item3 ?item4 ?tail
 > WHERE {
 >   :CornBread food:steps ?steps.
 >   ?steps rdf:next ?steps2.
 >   ?steps2 rdf:next ?steps3.
 >   ?steps3 rdf:next ?steps4.
 >   ?steps4 rdf:next ?head.
 >
 >   ?head rdf:first ?item1;
 >          rdf:next ?list2 .
 >   OPTIONAL {
 >     ?list2 rdf:first ?item2;
 >            rdf:next ?list3 .
 >     OPTIONAL {
 >       ?list3 rdf:first ?item3;
 >              rdf:next ?list4 .
 >       OPTIONAL {
 >           ?list4 rdf:first ?item4;
 >                  rdf:next ?tail
 >       }
 >     }
 >   }
 >
 > }
Received on Sunday, 3 July 2005 19:31:41 UTC