SOURCE - Choosing what to query and querying the origin of statements

Here is a rough proposal to update to 
  http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
sections 8 and 9 with respect to the SOURCE issue.

There are plenty of issues related to this discussed already but
let's see how this does.

A quick comparision to named graphs/named containers/earlier work
- No access to individual named graphs (i.e. no SOURCE <uri>)
- It does imply dynamic RDF-merging, but it could be de-emphasised
  that it's not required at run time, but the result must be as-if
  that had been done.
- No bnode graph names (issue)
- Left out DISTINCT for now, that's a result thing

Dave


-----


8 Choosing What to Query

A SPARQL query is against a single RDF *Query Graph*.  This graph may
be constructed through logical inference, and never materialized.  It
can be arbitrarily large or infinite.  The Query Graph is a virtual
RDF-merge operation over a set of *RDF Graphs*:

  Definition: Query Graph

  Given a set of RDF Graphs {RG1, ..., RGn}, the Query Graph QG is
  an RDF graph formed from the RDF-merge of the set {RG1, ..., RGn}.

  All of the graphs RG1...RGn have *Graph Names* GN1...GNn which are
  URI References (URIrefs) 

where RDF-merge is defined in RDF Semantics 0.3 Graph Definitions
  http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#graphdefs 


The Query Graph can be defined in the following ways:

1) In the SPARQL query language using the FROM clause

   See below.

2) By the SPARQL protocol

   ISSUE: Depends on protocol doc.  Probably works by giving the set
   of URIrefs of the graphs? or giving a URIref for the query graph?
   or query service?

3) Against a default query graph if neither of 1) or 2) are given.
   This is application-specific.


In the SPARQL query language the FROM clause can specify the set of
graphs by either giving their names or giving the URIs for a resource
that can be used to retrieve the graph.

(Q8.1) The query

  SELECT *
  FROM <http://www.w3.org/2000/08/w3c-synd/home.rss>
  WHERE ( ?x ?y ?z )

creates a Query Graph by using the resource at URI 
  http://www.w3.org/2000/08/w3c-synd/home.rss
to provide RDF triples, making an RDF graph RG1.  Graph RG1 is named
by the URI and constructs a query graph from the set {RG1}.


(Q8.2) The query

  SELECT *
  FROM <http://www.w3.org/2000/08/w3c-synd/home.rss> NAMED <http://example.org/>
  WHERE ( ?x ?y ?z )

Constructs the same query graph but names the graph RG1 <http://example.org/>

(Q8.3) The query

  SELECT *
  FROM NAMED <http://example.org/>
  WHERE ( ?x ?y ?z )

Creates a query graph from a set of 1 graph named <http://example.org/>
The URI here is not for resource retrieval.


When multiple graphs are given in FROM, the RDF-merge of the set of
graphs is performed to create the query graph.

The query
(Q8.4)
  SELECT *
  FROM <uri1>, <uri2>
  WHERE ( ?x ?y ?z )

creates a query graph from the RDF-merge from the set of graphs {RG1, RG2} 
where
  RG1 is the RDF graph formed by retrieving the resource at uri1 and
    named uri1
  RG2 is the RDF graph formed by retrieving the resource at uri2 and
    named uri2


A SPARQL implementation MAY not support graph names in which case the
queries that use only the NAMED keyword will fail - Q8.3


  Possible extension:

  Allow graphs with a local name (blank node label)

  (Q8.5)
    SELECT *
    FROM NAMED _:a, NAMED _:b
    WHERE ( ?x ?y ?z )

  rather than relying on the application-specific choice 3) above.

  However details below would have to be changed to forbid returning
  the blank nodes of the names in results.


9 Querying the Origin of Statements

While the RDF data model is limited to expressing triples with a
subject, predicate and object, many RDF data stores augment this with
a notion of the source of each triple.  Typically, implementations
associate RDF triples or graphs with a URI specifying their real or
virtual origin.  The SOURCE keyword allows you to query or constrain
the source of the following triple pattern or nested graph
pattern. The general form of the SOURCE query is:

 SOURCE ?var (?s ?p ?o)

When SOURCE ?var is given before a triple, the variable will be bound
to all of the known *Graph Names* for that triple.  A data store that
does not support graph names SHOULD provide no binding for the SOURCE
variables.

  D9.1 Data:

  Graph G1 named <aliceFoaf.n3>
  @prefix  foaf:  <http://xmlns.com/foaf/0.1/> .

  _:1 foaf:mbox <mailto:alice@work.example>.
  _:1 foaf:knows _:2.
  _:2 foaf:mbox <mailto:bob@work.example>.
  _:2 foaf:age 32.

  Graph G2 named <bobFoaf.n3>
  @prefix  foaf:  <http://xmlns.com/foaf/0.1/> .

  _:1 foaf:mbox <mailto:bob@work.example>.
  _:1 foaf:PersonalProfileDocument <bobFoaf.n3>.
  _:1 foaf:age 35.


  The Query Graph is the RDF-merge of {G1, G2}


  Q9.1 Query:

  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
  SELECT ?mbox ?age ?ppd
  WHERE       ( ?alice foaf:mbox <mailto:alice@work.example> )
	      ( ?alice foaf:knows ?whom )
	      ( ?whom foaf:mbox ?mbox )
	      ( ?whom foaf:PersonalProfileDocument ?ppd )
  SOURCE ?ppd ( ?whom foaf:age ?age )

  R9.1 Result:
  mbox                      	age 	ppd
  <mailto:bob@work.example> 	35 	<bobFoaf.n3>

This query returns the email addresses of people that Alice knows. It
also returns their age according to their PersonalProfileDocument
documents, as well as the URI of the graph. Alice's guess of Bob's
age (32) is not returned.


Any variable that is not bound must not match another variable that
is not bound. Thus,

  Query Q9.2:
  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
  SELECT ?given ?family
  WHERE SOURCE ?ppd ( ?whom foaf:given ?family )
	SOURCE ?ppd ( ?whom foaf:family ?family )

will match only if the source of both triples are known and the same.

A SPARQL implementation MAY not support graph names in which case the
SOURCE ?var parts are ignored.

-----------------

References

Named Containers
http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/0581.html

Named Graphs and TriX
http://www.w3.org/2004/03/trix/

Named Graphs, Provenance and Trust
Carroll, Jeremy J.; Bizer, Christian; Hayes, Patrick; Stickler, Patrick
HPL-2004-57, 20040513 
http://hpl.hp.com/techreports/2004/HPL-2004-57.html

...

Received on Monday, 8 November 2004 16:30:16 UTC