Re: Blank Nodes and SPARQL from Seaborne, Andy on 2005-06-27 (public-rdf-dawg-comments@w3.org from June 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 27 Jun 2005 16:24:41 +0100
To: Ron Alford <ronwalf@umd.edu>
CC: public-rdf-dawg-comments@w3.org, Amy Alford <aloomis@glue.umd.edu>
Message-ID: <42C01A39.7010504@hp.com>
Ron Alford wrote:
> == Suggested Change ==
> Strike the following section from the SPARQL Query spec:
> """It behaves as a variable, although it can not be mentioned in the
> query result form or anyplace else outside a graph pattern.
> 
> Blank nodes in queries are distinct from all blank nodes in the data. A
> blank node in a graph pattern does not match a blank node in the data by
> blank node label."""
> 
> Change "is only" to "might only be" in "An application or client
> receiving the results of a query can tell that two solutions or two
> variable bindings differ in blank nodes but this information is only
> scoped to the results as defined in "SPARQL Variable Binding Results XML
> Format" or the CONSTRUCT result form."
> 
> == Motivation ==
> 
> As the spec stands, it cannot be extended to allow bnodes to be used as
> temporary identifiers across queries within a session.  We would like to
> work on an extension to sparql that supports sessions in which bnodes
> labels are persistent and may be referred to.

I understand the desire to be able to directly identify blank nodes so that 
exactly the right graph node can be found again by a subsequent query.  It would 
be very helpful in scaling RDG graphs to span machines but stil using the 
standard seriualization forms to exchange parts of the graph.  Exposing blank 
node labels helps but it is not a general solution we can apply to all systems. 
   For example, if a server restarts, rereading a file, are labels maintained?

As such for your extension to SPARQL I suggest that such an extension includes 
the handling of blank nodes by whatever your system uses for identifiers.  As an 
extension, it is not SPARQL - but then you were extending it anyway.

Exporting the label and giving this label the characteristics for session based 
browsing or editting is very similar to assigning an identifing property so 
maybe assigning such a label is a better way to handle it.

> 
> 
> == Use Cases Supported by BNode Reference ==
> 
> 
> === RDF Browsing ===
> 
> Interactive browsing of RDF data requires sequential queries to the
> database based on user interaction and prior results.
> 
> === Lists ===
> 
> RDF Collections use bnodes to create a linked list.  OWL, among other
> languages, uses these list as part of its syntax.  Without being able to
> reference bnodes, any query that wished to expand a list would have to
> iteratively expand its original query.

The WG postponed this - one of the reasons was because it is not clear that 
query language support is the best or only approach.  Similar to rdfs:member, 
there could be an inferred property :listMember that related a resource which 
was also a list to the list members.

> 
> == Use Cases Supported by BNode Stability ==
> 
> === Multiple Arity Predicates ===
> 
> A query for multiple high-arity predicates leads to excessively large
> result sets, since the semantics of SPARQL state that the results
> contain all possible graph matches.
> 
> Take the following rough example:  A foaf description of a person
> includes 6 foaf:nicks, 4 foaf:mboxes, and 7 foaf:knows. The result set
> querying for these specific properties will return 168 rows.
> 
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> PREFIX owl: <http://www.w3.org/2002/07/owl#>
> SELECT ?nick ?mbox ?knows
> FROM <http://sarn.org/foaf.rdf>
> WHERE { ?person foaf:mbox <mailto:aloomis@sarn.org> .
>         ?person foaf:knows ?knows.
>         ?person foaf:mbox ?mbox
>         ?person foaf:nick ?nick.
> }

As an aside, you can reduce the fan out by asking 3 queries or even:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?nick ?mbox ?knows
FROM <http://sarn.org/foaf.rdf>
WHERE { ?person foaf:mbox <mailto:aloomis@sarn.org> .
          {
           { ?person foaf:knows [ foaf:mbox ?knows ] }
         UNION
           { ?person foaf:mbox ?mbox }
         UNION
           { ?person foaf:nick ?nick. }
          }
       }

which gives me 17 rows.

> 
> 
> The solution to this is to split the query into one query per property.
>  However, combining the results relies on bnode stability (especially in
> the case of foaf).

{ ?person foaf:mbox <mailto:aloomis@sarn.org> } uses a unqiuely defining 
property so there is no requirement on blank node labels remaining the same - 
putting this in the query each time will quickly find the right graph node.

> 
> 
> === Limit / Offset ===
> 
> If limit and offset are going to be used as a cursor on data that makes
> use of bnodes, the bnodes need to be stable between the results.

Agreed - limit/offset are not a complete cursor mechanism.  SPARQL does not 
provide support for sessions.  Of course, any implementation is free to do a 
good job and provide stability of the results sequence across calls where it 
can.  But that isn't the same as guaranteeing it for all implementations (e.g. 
across server restarts or when <http://sarn.org/foaf.rdf> changes and is reread).

	Andy
Received on Monday, 27 June 2005 15:26:01 UTC