Re: Blank Nodes and SPARQL

Seaborne, Andy wrote:

[snipping quite a bit]

> The data provider could have chosen to provide a URI to a graph node.  By
> using a blank node, they are stopping clients directly addressing that
> node in
> the graph.  Maybe there is a reason for that.  There seems to be a tension
> between publisher and consumer of the data here.  Why did the data
> publishers
> choose that data model over, say an rdf:Seq?

This tension is being introduced ex post facto.  The use of bnodes has
always restricted linkability, and not accessibility.
As for why choose lists over a sequence, the answer has to do with
modeling.  It's impossible to express restrictions on an infinite number
of properties in the current ontology languages.


> The working group has decide to postpone the issue of accessing RDF
> collections:
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jun/0016.html
>
>
> One of the reasons is because there are non-query ways of addressing the
> matter.  FOAF's approach is inverse functional properties; inference may
> also
> be used.

As I've said before, accessing lists is just one repercussion of not
being able to directly address bnodes.  Not only that, the proposed
solution has little use where order needs to be maintained.

There are also containers other than the RDF sanctified ones.  Take
OWL-S, for instance.  They define a shadow list class at
http://www.daml.org/services/owl-s/1.1/generic/ObjectList.owl
The rdf:member predicate won't help at all in these cases.

It's a separate issue, but you can read about why they had to do this
here: http://www.w3.org/Submission/OWL-S/#AppendixB
It boils down to an OWL-DL syntax issue.

As for inverse functional properties, I've already shown many cases
where bnodes are used as syntax in various rdf based languages.

Even in foaf, there are many cases where either
a) The IFPs just aren't available on a node.
b) The IFPs are not trustworthy (ie, foaf:homepage and LiveJournal)

Inferencing doesn't help in any of these three cases, and actually hurts
in the second part of the last case.  Also, relying on inferencing is
rather painful, because the cases where bnodes are going to hurt worst
are aggregated data sets (search engines).  Adding an inferencing layer
on top of such large stores is going to cost.


> ==== Protocol
>
> RDQL/Jena has had for some while the ability to pass in values for
> variables
> at the start of query execution.  One use of this is to pass in programming
> language level objects, include bNodes, so that the all solutions of the
> query
> have that a fixed value for a variable.  It's a mechanism akin to SQL
> client
> templates but done by naming, not position.
>
> This can be extended to the SPARQL protocol:
>
>        ?query=SELECT...&varX=bNode:xyz&...
>
> Use
>   SELECT ?item ?tail WHERE { ?x rdf:first ?item ; rdf:rest ?tail }
> which becomes at the server:
>   SELECT ?item ?tail WHERE { <bnode(xyz)> rdf:first ?item ; rdf:rest
> ?tail }

Slick and useful beyond the scope of bnodes.
I see several benefits:
a) It's easy for the protocol layer of a sparql server determine for
itself whether and how it allows bnode access
b) Lets you have template queries that don't need to be munged.  There
are a wide variety of apps where it's just easier not to be doing string
substitutions.
c) Provides a limited hook for access control to a sparql store.  The
protocol layer can have a list of valid queries for a given access level
that must string match exactly, but can be parameterized.
d) Provides consistency in selecting bnodes, uris, and literals.
e) It's implementable, if not efficiently, on top of almost sparql query
engine.  Just filter the results on the way out.


Are the any comments from query implementers and protocol people on this
solution?


> ==== SPARQL Extensibility
>
> SPARQL has two extension points: value functions and DESCRIBE.
>
> == SPARQL Function Extension
>
> (idea from Steve Harris)
>
> Have a custom function that tests the bNode label.  This isn't covered
> by the
> SPARQL value model - it's using the function extension point as a tunnel
> between client and server inside the SPARQL syntax.
>
>      FILTER ext:bNodeLabel(?x, "label")
>
> SELECT ?item ?tail WHERE { ?x rdf:first ?item ;
>                                rdf:rest ?tail .
>                             FILTER ext:bNodeLabel(?x, "xyz") }
>

There is a drastic inconsistency here between accessing bnodes, and
accessing literals and URIs.  This requires a fair amount of query
munging when doing substitutions.


> == DESCRIBE
>
> Accessing list elements one by one isn't nice if the list is of any size so
> get it all at once.  Your use case is about a description of the whole
> recipe

In my use case I was accessing the instructions in chunks, which would
be perfectly fine many cases (think of web tutorials that split
instructions across multiple pages).

> - this could be the CDB (Concise Bounded Description) of the thing and
> other
> similar schemes for the information provider to give an answer that the
> client
> can't completely determine.  In SPARQL, the DESCRIBE result form provides a
> hook for this.  It enables the server to return the whole recipe in a
> single call.
>
> CDB can be found at http://sw.nokia.com/uriqa/CBD.html

There are a several of problems with using DESCRIBE.

The most obvious is that it requires parsing and requerying the data on
the client side.  The query conditions end up just being a filter on the
data.

Also, unless you're just querying about a single uri, it's easy to lose
the context of the original query[1].

Thirdly, the DESCRIBE hook is some what limited since the only way to
provide different result patterns is to provide different end points.
This is unfortunate, since unlike the function extensions, there is no
consistent name across stores for the client to specify.

> ==== Make nodes addressible
>
> == Dynamically assign identifiers
...
> Some may not like automatically assigning URIs to replace the bNodes.
> True.
> But you want to reference the blank nodes by their identity.  Exposing the
> labels is no different.

Then a generic client would have no way of knowing what was and was not
a URI.  This solution would break any rdf that was based off the the
results of a query.


>
> == Split the label space of bNodes
>
> Use a different prefix to identify the two spaces of bNodes.
>
> _:a for ones that are query bNodes and
> _!:xyz for ones in the target graph.
>
> Pick marker characters to your heart's content.
>
> A variation is to in the space of labels: _:!xyz
>
> This a bit like syntax support for the dynamically assigned identifiers.

Icky, ugly, and confusing, but workable.


> Of these, the protocol approach would appear to fit with your session
> paradigm
> best.  I've used the the local version for sometime.


These solutions are mostly orthogonal to implementing sessions.
Sessions are a means of assuring data stability across multiple queries.
 This has has more obvious effects on bnodes than more anything else,
but it's still important to have.

I know I'm not alone in needing to reference bnodes[2], and you seem to
have roughed out a workable solution.  It would be nice if this was
included in the spec.


-Ron



[1] Here's a simple example where without the context of the query, the
meaning of the results is lost.

Background graph:

PREFIX : <http://example.com/>.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

:Ron foaf:knows _:Amy.

_:Amy foaf:mbox <aloomis@glue.umd.edu>;
      foaf:knows _:John .
_:John foaf:knows _:Amy .


Query:
PREFIX foaf <http://xmlns.com/0.1>
PREFIX ex <http://example.com/>

DESCRIBE ?person WHERE
  ex:Ron foaf:knows ?person .

Results of CBD:
PREFIX : <http://example.com/>.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

_:Amy foaf:mbox <aloomis@glue.umd.edu>;
      foaf:knows _:John .
_:John foaf:knows _:Amy .


The solution here is to put ex:Ron after DESCRIBE, but this could lead
to quite a bit more data than I needed.


[2]
>From #Swig this morning:
http://ilrt.org/discovery/chatlogs/swig/2005-07-05.html#T10-58-51
SeRQL Discussion:
http://www.openrdf.org/issues/browse/SES-40?page=all

Received on Tuesday, 5 July 2005 20:42:56 UTC