Re: why I don't like named graph IRIs in the DATASET proposal from Pat Hayes on 2011-10-03 (public-rdf-wg@w3.org from October 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 2 Oct 2011 20:27:04 -0500
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Cc: Richard Cyganiak <richard@cyganiak.de>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <65435459-B3F1-4C63-8DEC-ABA8742C1686@ihmc.us>
On Sep 30, 2011, at 7:49 AM, Pierre-Antoine Champin wrote:

> Richard,
> 
> On 09/30/2011 12:02 PM, Richard Cyganiak wrote:
>> On 29 Sep 2011, at 17:31, Pierre-Antoine Champin wrote:
>>> SPARQL states that:
>>>> An RDF Dataset comprises one graph, the default graph, which does
>>>> not have a name, and zero or more named graphs, where each named
>>>> graph is identified by an IRI.
>> 
>> Well that's SPARQL. We are talking about RDF Concepts. It says [1]:
>> 
>> [[
>> Each named graph is a pair consisting of an IRI (the graph name), and an RDF graph. Graph names are unique within an RDF dataset.
>> ]]
> 
> sorry, I didn't notice that you rephrased it.
> 
>> It avoids words like “identify” and “denote”.
> 
> And with very good reasons.
> However, IRIs in RDF have been traditionnaly used to denote resource, so
> even if you refrain from using those words, it is very easy for the
> reader to see them anyway.
> 
> So I suggest that omitting those words is not sufficient. The definition
> should be followed by a warning, e.g.:
> 
>  Note that : graph names in a dataset are not used to denote the graph
>  in the way an IRI node denotes a resource.

As phrased, that does not make sense. There is no "way" of denoting something. To say that A denotes B is simply to say that A is being used as a name to refer to B. 

I have to say, this while discussion seems to me to be off the rails, and wandering into fantasy. If an identifier of some kind is being used to refer to some thing, whatever the thing is, then it is being used to, and can correctly be said to, DENOTE that thing. That is what the word "denote"  MEANS. And this is, believe it or not, a semantic idea. Now, I know many people in the WG and reading these messages have an agenda to eliminate "semantics" from RDF and purge it of all this academic semi-philosophical nonsense; and, you may be surprised to learn, I have some sympathy with that idea. I would be quite happy for us to declare that RDF is simply a handy notation, on a par with JSON, perhaps with a non-normative semantic sketch suggesting a best practice, but no formal model theory at all. (The linked data community will rejoice, and the OWL community will, after a brief period of howling, finally break its tenuous links to RDF and use some other base notation, to many people's great relief.) 

But if we decide to do this, let us do it expliclty and openly. And if we do not decide to do this, then RDF will continue to be a notation with a normative model-theoretic semantics in which the IRIs in RDF triples are all treated as denoting names, and the truth of triples is defined in terms of that these names denote. And if we stick with this normative-semantics idea, it is simply not acceptable to split hairs and bullshit along the lines that  because we are using a different word than "denote" that is is OK to run a truck through the semantic ideas on which RDF is currently based. 

If a URI is being used within any RDF triple, to be the name of a graph, then it denotes that graph (In the model-theoretic sense). IF it does not denote the graph then it is not a name for the graph and it does not identify the graph. End of story. This allows the fourth field IRI in a quad store to be a "label" of a graph, but not to be used in any RDF triple to refer to the graph on the basis of this labeling (unless we somehow impose some extra semantic conditions to achieve the necessary lock on what the IRI denotes, as described in the original named-graph paper.) 

Pat







> 
>>> So I would argue that, in the end of the day, neither of the following
>>> sentence is accurate:
>>> 
>>> a named graph is identified by an IRI
>>> a named graph is labeled by an IRI
>>> 
>>> but in fact:
>>> 
>>> a named graph is labelled by a resource
>> 
>> That's not accurate at all.
> 
> Well, take example 1 from
> http://www.w3.org/TR/rdf-sparql-query/#exampleDatasets
> which is supposed "to have information in the default graph that
> includes provenance information about the named graphs"
> 
> The default graph contains:
> 
>  <http://example.org/bob>    dc:publisher  "Bob" .
> 
> which means that the *resource denoted* by <http://example.org/bob> is
> related to the string "Bob" by the relation denoted by predicate
> dc:publisher. It is *not* the IRI "http://example.org/bob" which is
> related to "Bob", but a *resource*. If I knew another IRI for that
> resource, I could rewrite that triple
> 
>  <http://example.other.com/bob>  dc:publisher  "Bob" .
> 
> without changing the meaning of that triple in any way.
> 
> 
> So the only way for this triple to provide information about a graph in
> the dataset is that the graph be in fact associated with the *resource*
> and not the IRI.
> 
> 
> Of course, all this derives from examples in the SPARQL document, not
> the Dataset definition in your ED. However, your argument in favor of
> the dataset proposal was to reuse something known to work, rather than
> reinventing it. My point is:
> 
> * the SPARQL definition has some theoretical caveats
> * rephrasing the definition as you did may in principle solve this, but
> does not remove the risk of confusion, because
>  * IRIs are used differently for resources and for graphs
>  * an SPARQL fuels the confusion by using the same syntax (<>
>    brackets) for both IRI nodes and graph names
> 
> pa
> 
> 
>> A named graph is an <IRI,graph> pair. The IRI is called the graph name.
>> 
>> As written in the ED, the relationship between the IRI and the graph is neither “identifies” nor “labels”; it is “is graph name of”. No relationship between the resource denoted by the IRI and the graph is implied by the wording in the ED.
>> 
>>> (imagine for example a owl:sameAs statement between two graphs IRI in a
>>> SPARQL engine supporting OWL inference; what would that mean?)
>> 
>> owl:sameAs means that two terms denote the same resource. As written in the ED, use of those terms as graph names is entirely orthogonal to that.
>> 
>> I think that's a good thing. Named graphs are key to trust and provenance. Trust and provenance must happen at a lower level in the stack, before reasoning and inference kick in. W3C's version of the layer cake, where trust sits above reasoning, cannot work. The moment you reason with OWL over untrusted data, you're fucked.
>> 
>> Best,
>> Richard
>> 
>> [1] http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-multigraph
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 3 October 2011 01:27:36 UTC