Named of graphs (was: Re: SPARQL Protocol for RDF) from Seaborne, Andy on 2005-06-06 (public-rdf-dawg-comments@w3.org from June 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 06 Jun 2005 13:11:00 +0100
To: Patrick Stickler <patrick.stickler@nokia.com>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <42A43D54.5070305@hp.com>
Patrick Stickler wrote:
> 
<snip/>
[Kendal has addressed the protocol parts]

> 
> 4. Related to the above, but actually a comment regarding the
> SPARQL spec itself, it seems there is a conflict between
> the FROM construct and the definition of a dataset, since, if
> the background graph is "unnamed", then how could one
> refer to it with a FROM construct? I think the problem here
> is simply with language, not an inherent flaw in SPARQL.
> 
> It is my understanding that, while not manditory, the URIs
> specified using the FROM and FROM NAMED constructs are
> often expected/hoped to be resolvable at run time to a graph,
> by dereferencing such URIs, and that many SPARQL processors
> when encountering unknown graph names will attempt to retrieve
> those graphs via their URIs. That's fine, and demonstrates
> how well the OFWeb and SemWeb can be integrated on the basis
> of a shared set of URIs (let's just hope that everyone agrees
> that graphs are information resources ;-) but the bottom line
> is that a named graph is a named graph is a named graph, so
> if one can use FROM to specify the background graph of a dataset,
> then the background graph of a dataset can be a named graph (even
> if it need not be named for all queries/applications).

Not all graphs have names (natural names, if you like).

Examples include some of the simpler use cases:

UC1: A program reads a file: URI (which are rarely global) which has the 
serialization of a graph.  That graph happens to be a copy from the web stored 
locally.  Here naming the graph just because it was identified by a file: URI is 
misleading.

UC2: A program reads two graphs, performing an RDF merge.

UC3: A program reads a graph, and wishes to query over the RDFS entailments.
[more else where later on this as it is more applicable to a later message]

We could insist that every graph has a name - I find that people aren't very 
diligent in generating globally unique names when used purely within their own 
application.

> 
> I think that the definition of a dataset should not state that
> the background graph is necessarily unnamed, but rather than it is
> simply the background graph, such that any queries evaluated against
> that dataset, which do not specify any graph, are evaluated
> against that background graph. Now, how a given SPARQL processor
> knows which graph is the background graph for a given query is
> of course relevant, and I don't see that any major changes are
> needed to SPARQL to identify the background graph.
> 
> Namely, if no FROM clause is provided, then it is left up to the
> SPARQL processor to decide which is the background graph for a
> given query. If there is a FROM clause provided, then the graph
> thus specified is the background graph for the query. Thus, it
> is not essential to stipulate whether the background graph be
> either unnamed or named insofar as the definition of a dataset
> is concerned, only that it is clear to the processor which
> graph is the background graph of a dataset when evaluating a
> given query.
> 
> This can be fixed easily enough, I think, by changing the single word
> 'does' to 'need' in section 7 of the SPARQL spec.
> 
> I.e. change
> 
> [
>     There is one graph, the background graph, which does not
>     have a name, and zero or more named graphs, identified by
>     URI reference.
> ]
> 
> to
> 
> [
>     There is one graph, the background graph, which need not
>     have a name, and zero or more named graphs, identified by
>     URI reference.
> ]
> 
> and then later, add some statement such as
> 
> [
>     If a given query does not specify the background graph by
>     name, using the FROM operator, then the SPARQL processor
>     must decide which background graph is most appropriate
>     for evaluating the query. The SPARQL processor should
>     be consistent in the default background graph
>     used for all queries not specifying a background graph
>     explicitly.
> ]
> 
> Of course, serialization of a dataset introduces some additional
> issues, as to how to identify the background graph. My recommendation
> would be to use any generic RDF serialization which supports named
> graphs, and define a vocabulary to describe a dataset, which specifies
> the background and/or named graphs belonging to that particular dataset.
> 
> E.g. using TriG, the dataset from Example 1 in section 7.1 of
> the SPARQL spec could be unambiguously serialized as:
> 
> @prefix sparql: <http://www.w3.org/TR/rdf-sparql-query/> .
> @prefix dc:     <http://purl.org/dc/elements/1.1/> .
> @prefix foaf:   <http://xmlns.com/foaf/0.1/> .
> @prefix :       <http://example.com/myDatasetSerialization/> .
> 
> :ds a sparql:Dataset ;
>      sparql:BackgroundGraph :bg ;
>      sparql:NamedGraph      <http://example.org/bob> ;
>      sparql:NamedGraph      <http://example.org/alice>.
> 
> :bg
> {
>     <http://example.org/bob>    dc:publisher  "Bob" .
>     <http://example.org/alice>  dc:publisher  "Alice" .
> }

Couldn't that be:

:ds a sparql:Dataset ;
      sparql:BackgroundGraph
      {
         <http://example.org/bob>    dc:publisher  "Bob" .
         <http://example.org/alice>  dc:publisher  "Alice" .
      } ;
      sparql:NamedGraph      <http://example.org/bob> ;
      sparql:NamedGraph      <http://example.org/alice>.

The :bg is a way of making a syntactic connection bewteen the 
sparql:BackgroundGraph triple and the sub-serilization.

The fact it is used a name (externally visible) is one way to doing it.  Having 
a local-scoped label like "=:bg" would also meet the serialization requirements.

	Andy

> 
> <http://example.org/bob>
> {
>     _:a foaf:name "Bob" .
>     _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
> }
> 
> <http://example.org/alice>
> {
>     _:a foaf:name "Alice" .
>     _:a foaf:mbox <mailto:alice@work.example.org> .
> }
> 
> It's important to note that, in the case of serializing datasets with
> unnamed background graphs, it is necessary to give the background graph
> a name, but in doing so, it also means that by using this approach,
> serialization formats such as TriG and TriX can be used to serialize
> multiple datasets in a single TriG or TriX instance (if ever useful
> or necessary to do so) in addition to unambiguously serializing a
> single dataset.
> 
> (I've been generally uncomfortable with processors naming unnamed 
> graphs,
> for the sake of round trip integrity and consistency, but I've come to
> see this approach as the least expensive and disruptive to existing
> tools and processes, and one which maximally exploits the RDF machinery.
> Earlier comments regarding serialization were also based in the 
> understanding
> that background graphs must be unnamed, hence introducing a problem when
> directly parsing/syndicating a serialization where the background graph 
> has
> been named -- but as this is actually not the case, and such a conflict
> would not arise, I feel much more comfortable with this approach)
> 
> Regards,
> 
> Patrick
> 
> --
> 
> Patrick Stickler
> Senior Architect
> Forum Nokia
> Hatanpäänkatu 1 A
> 33900 Tampere Finland
> 
> phone:  +358 40 801 9690
> fax:    +358 7180 75700
> email:  patrick.stickler@nokia.com
> 
> Forum Nokia provides a wealth of resources to mobile
> developers. For the latest on mobile tools, devices and
> technologies, go to http://www.forum.nokia.com
> 
>
Received on Monday, 6 June 2005 12:11:21 UTC