Re: SOURCE - Choosing what to query and querying the origin of statements

On Mon, 2004-11-08 at 16:28 +0000, Dave Beckett wrote:
> Here is a rough proposal to update to 
> sections 8 and 9 with respect to the SOURCE issue.

This looks reasonably clear and complete.

I don't understand "Any variable that is not bound must not match..."

Other than that, my comments should be seen as advice to the editor,
should he/we choose to incorporate this proposal.

> There are plenty of issues related to this discussed already but
> let's see how this does.
> A quick comparision to named graphs/named containers/earlier work
> - No access to individual named graphs (i.e. no SOURCE <uri>)
> - It does imply dynamic RDF-merging, but it could be de-emphasised
>   that it's not required at run time, but the result must be as-if
>   that had been done.
> - No bnode graph names (issue)
> - Left out DISTINCT for now, that's a result thing
> Dave
> -----
> 8 Choosing What to Query
> A SPARQL query is against a single RDF *Query Graph*.  This graph may
> be constructed through logical inference, and never materialized.  It
> can be arbitrarily large or infinite.  The Query Graph is a virtual
> RDF-merge operation over a set of *RDF Graphs*:
>   Definition: Query Graph
>   Given a set of RDF Graphs {RG1, ..., RGn}, the Query Graph QG is
>   an RDF graph formed from the RDF-merge of the set {RG1, ..., RGn}.
>   All of the graphs RG1...RGn have *Graph Names* GN1...GNn which are
>   URI References (URIrefs) 

I'd probably phrase that as a mappint from graph names
to RDF graphs, but your meaning is clear enough...

> where RDF-merge is defined in RDF Semantics 0.3 Graph Definitions
> The Query Graph can be defined in the following ways:
> 1) In the SPARQL query language using the FROM clause
>    See below.
> 2) By the SPARQL protocol
>    ISSUE: Depends on protocol doc.  Probably works by giving the set
>    of URIrefs of the graphs? or giving a URIref for the query graph?
>    or query service?
> 3) Against a default query graph if neither of 1) or 2) are given.
>    This is application-specific.
> In the SPARQL query language the FROM clause can specify the set of
> graphs by either giving their names or giving the URIs for a resource
> that can be used to retrieve the graph.
> (Q8.1) The query
>   SELECT *
>   FROM <>
>   WHERE ( ?x ?y ?z )
> creates a Query Graph by using the resource at URI 
> to provide RDF triples, making an RDF graph RG1.  Graph RG1 is named
> by the URI and constructs a query graph from the set {RG1}.
> (Q8.2) The query
>   SELECT *
>   FROM <> NAMED <>
>   WHERE ( ?x ?y ?z )
> Constructs the same query graph but names the graph RG1 <>
> (Q8.3) The query
>   SELECT *
>   WHERE ( ?x ?y ?z )
> Creates a query graph from a set of 1 graph named <>
> The URI here is not for resource retrieval.
> When multiple graphs are given in FROM, the RDF-merge of the set of
> graphs is performed to create the query graph.
> The query
> (Q8.4)
>   SELECT *
>   FROM <uri1>, <uri2>
>   WHERE ( ?x ?y ?z )
> creates a query graph from the RDF-merge from the set of graphs {RG1, RG2} 
> where
>   RG1 is the RDF graph formed by retrieving the resource at uri1 and
>     named uri1
>   RG2 is the RDF graph formed by retrieving the resource at uri2 and
>     named uri2
> A SPARQL implementation MAY not support graph names in which case the
> queries that use only the NAMED keyword will fail - Q8.3

To date, the SPARQL spec hasn't defined a term
like "SPARQL implementation", and I don't recall
"fail" so far either.

I can't tell what Q8.3 refers to.

>   Possible extension:
>   Allow graphs with a local name (blank node label)
>   (Q8.5)
>     SELECT *
>     FROM NAMED _:a, NAMED _:b
>     WHERE ( ?x ?y ?z )
>   rather than relying on the application-specific choice 3) above.
>   However details below would have to be changed to forbid returning
>   the blank nodes of the names in results.
> 9 Querying the Origin of Statements
> While the RDF data model is limited to expressing triples with a
> subject, predicate and object, many RDF data stores augment this with
> a notion of the source of each triple.  Typically, implementations
> associate RDF triples or graphs with a URI specifying their real or
> virtual origin.  The SOURCE keyword allows you to query or constrain
> the source of the following triple pattern or nested graph
> pattern. The general form of the SOURCE query is:
>  SOURCE ?var (?s ?p ?o)
> When SOURCE ?var is given before a triple, the variable will be bound
> to all of the known *Graph Names* for that triple.

I gather that "known" refers to the Query Graph QG.

>   A data store that
> does not support graph names SHOULD provide no binding for the SOURCE
> variables.

Again the normative-looking reference to software. So far the editors
have kept that sort of thing to informative prose and kept the
definitions of things like query result independent of it.

I think you're suggesting that there are 2 query results for queries
that use SOURCE and that implementations are free to return
either. Is that right?

>   D9.1 Data:
>   Graph G1 named <aliceFoaf.n3>
>   @prefix  foaf:  <> .
>   _:1 foaf:mbox <mailto:alice@work.example>.
>   _:1 foaf:knows _:2.
>   _:2 foaf:mbox <mailto:bob@work.example>.
>   _:2 foaf:age 32.
>   Graph G2 named <bobFoaf.n3>
>   @prefix  foaf:  <> .
>   _:1 foaf:mbox <mailto:bob@work.example>.
>   _:1 foaf:PersonalProfileDocument <bobFoaf.n3>.
>   _:1 foaf:age 35.
>   The Query Graph is the RDF-merge of {G1, G2}
>   Q9.1 Query:
>   PREFIX foaf:    <>
>   SELECT ?mbox ?age ?ppd
>   WHERE       ( ?alice foaf:mbox <mailto:alice@work.example> )
> 	      ( ?alice foaf:knows ?whom )
> 	      ( ?whom foaf:mbox ?mbox )
> 	      ( ?whom foaf:PersonalProfileDocument ?ppd )
>   SOURCE ?ppd ( ?whom foaf:age ?age )
>   R9.1 Result:
>   mbox                      	age 	ppd
>   <mailto:bob@work.example> 	35 	<bobFoaf.n3>

There are two possible results, right? one with ppd unbound?

> This query returns the email addresses of people that Alice knows. It
> also returns their age according to their PersonalProfileDocument
> documents, as well as the URI of the graph. Alice's guess of Bob's
> age (32) is not returned.

The example is good.

> Any variable that is not bound must not match another variable that
> is not bound.

I don't understand that sentence, even after studying the example
a few times. Hmm.

>  Thus,
>   Query Q9.2:
>   PREFIX foaf:    <>
>   SELECT ?given ?family
>   WHERE SOURCE ?ppd ( ?whom foaf:given ?family )
> 	SOURCE ?ppd ( ?whom foaf:family ?family )
> will match only if the source of both triples are known and the same.
> A SPARQL implementation MAY not support graph names in which case the
> SOURCE ?var parts are ignored.

If I understand the proposal, that can be phrased without reference
to implementations by saying, as above, that there are two possible
results to any query that uses SOURCE.

> -----------------
> References
> Named Containers
> Named Graphs and TriX
> Named Graphs, Provenance and Trust
> Carroll, Jeremy J.; Bizer, Christian; Hayes, Patrick; Stickler, Patrick
> HPL-2004-57, 20040513 

see also
Reaching out onto the Web

from the SWAP tutorial

> ...
Dan Connolly, W3C
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Tuesday, 9 November 2004 14:28:35 UTC