Re: [GRAPH] graph deadlock? from Pat Hayes on 2011-12-23 (public-rdf-wg@w3.org from December 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 23 Dec 2011 12:00:54 -0600
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-Id: <044FD532-AD51-4344-B2BD-824483985406@ihmc.us>
On Dec 21, 2011, at 5:18 AM, Andy Seaborne wrote:

> On 21/12/11 08:53, Ivan Herman wrote:

And there are responses to both of them inline below. 

>> 
>> On Dec 20, 2011, at 19:45 , Pat Hayes wrote:
>> 
>>> 
>>> On Dec 20, 2011, at 2:29 AM, Ivan Herman wrote:
>>> 
>>>> Pat,
>>>> 
>>>> On Dec 20, 2011, at 05:45 , Pat Hayes wrote:
>>>> 
>>>> [skip]
>>>> 
>>>>> 
>>>>> Now, consider the case where a URI  UUU is used as a graph
>>>>> label in a dataset, and also occurs in the RDF inside a graph
>>>>> in that same dataset, where it is interpreted as denoting, say,
>>>>> a human being or a mailbox. OK so far. Now, however, add the
>>>>> dataset some more RDF (perhaps in the default graph used to
>>>>> express some metadata, for example) in which that same URI is
>>>>> intended to be used to refer to the graph that it labels. There
>>>>> are *no* RDF interpretations in which a single URIref can
>>>>> denote two different things. So this dataset as a whole has no
>>>>> satisfying interpretations. So it is formally inconsistent.
>>>>> Moreover, the inconsistency arises directly, and obviously,
>>>>> from this usage in which a URI is used to "name" something
>>>>> other than what everyone agrees it is in fact interpreted to
>>>>> mean (as, vividly, in Ivan's example using an email address).
>>>>> And this is, surely, *obviously* at odds with the basic
>>>>> assumption of the entire Web, that URIs, when considered as
>>>>> names, identify *one* thing.
>>>>> 
>>>> 
>>>> is 'labeling' and 'identifying' the same?
>>> 
>>> Well, maybe not. But I suspect that if we try to say this, nobody
>>> will take the slightest notice. They certainly sound like they
>>> ought to be very closely related, so closely that only philosophers
>>> could distinguish them, and then only when there is an R in the
>>> month.
>> 
>> :-)
>> 
>>> And by the way, SPARQL talks about these URIs *naming* the graph,
>>> which sounds even more like identifying.
>> 
>> So we indeed have a naming (sic!) issue. Indeed, SPARQL uses the term
>> 'naming' for what I referred to as datasets. That is mess that,
>> unfortunately, we have to live with it:-(
> 
> 
> Specifically, the SPARQL Query spec says about the FROM NAMED syntax
> """
> The FROM NAMED syntax suggests that the IRI identifies the corresponding
> graph, but the relationship between an IRI and a graph in an RDF dataset
> is indirect. The IRI identifies a resource, and the resource is
> represented by a graph (or, more precisely: by a document that
> serializes a graph). For further details see [WEBARCH].
> """

Sure sounds like it is saying that the IRi names a graph container.  But I now think that this is in fact irrelevant to SPARQL and is a misleading paragraph. AFAIKS, all that SPARQL requires is that the IRI is paired with the graph in the dataset. It doesn't need to even mention any semantic relationship such as 'naming' between the IRI and the graph or graph container, nor does it require that the 'naming' IRI in this pair identify anything related to the graph, no matter how indirect this might be. (It might indeed have been better for everyone if SPARQL had simply shied away from using semantic terminology altogether.)

> 
> The RDF dataset definition is more general.
> 
> """
> Definition: RDF Dataset
> 
> An RDF dataset is a set:
> 
>   { G, (<u1>, G1), (<u2>, G2), ... (<un>, Gn) }
> 
> where G and each Gi are graphs, and each <ui> is an IRI.
> """
> 
> It adds:
> """
> Each <ui> is distinct.G is called the default graph. (<ui>, Gi) are called named graphs.

I would add that nowhere does it say that there is any relationship between Gi and <ui> , other than that they co-occur in a pair with a somewhat evocative name. It does not specify that <ui> denote or name or refer to G in any way, or indeed have any connection to it other than it is the same pair in this dataset. Which is exactly how people are using it, of course, as Richard and Antoine have been emphasizing. 

So – and perhaps this is what you, Ivan, have been advocating all along – we should distinguish actual referential naming of a graph (container) by an IRI, from the IRI/graph(container) relationship described or specified in a dataset, which is evidently not that of reference or naming (as the word is usually used) or what is usually called 'identification' of a resource by an IRI. 

However we are still left with the issue of what these IRIs are supposed to refer to when they are used in an RDF triple, as opposed to the 4th field of a quad store or in a SPARQL-defined RDF dataset 'named graph' pair. And here we have the central (and it seems to me the only important) issue, which is how to reconcile the obvious need to use the IRI to refer to the graph (in RDF metadata, and as several of us have been doing in these email threads) and the fact that they may also denote something else altogether, and the fact that they can't do both of these at the same time.

The only way I can see around this (apart from choosing to ignore it – which I am presuming is a course of last resort – or making some aspect of it non-conformant and swallowing the resulting discomfort) is to allow IRIs in RDF to be treated as punning (AKA overloading), under some circumstances, with the syntactic context of use determining the resolution of the punning ambiguity. The simplest way would be to restrict this to this special case of metadata in the default graph of an RDF dataset, but I think we could try to come up with a more general framework that might be of wider utility. I will take up that task in another thread, maybe after Xmas.

> """
> which I'd prefer, in hindsight, to drop or at least move out of the definition.
> 
> It adds nothing that affects the rest of the definition of SPARQL.  I'd even argue that it was "editorial", not "substantial", to the definition of SPARQL so does not invalidate the last call.
> 
>> For the sake of the discussion we may have to use different terms,
>> and let us forget about SPARQL for a while.
> 
> Oops :-)
> 
>> Although I do not have
>> the Sandro's talent of finding nice terms, let us say that we speak
>> about labelled graphs, i.e., datasets, and identified graphs. Let us
>> not use the term named graphs for a while...
>> 
>> - Labelled graphs are the minimal level, ie, just using URI-s
>> labeling graphs. We may have to add a restriction that, within a
>> dataset, labels are unique, ie, two different graphs must have two
>> different labels. No further assumptions are used. And, to refer to
>> an earlier quote of yours up there, the labels used that way would
>> take no part in any form of RDF interpretation whatsoever.
>> 
>> - Identified graphs are labelled graphs where there _is_ a relation,
>> through HTTP GET, between the label and the graph.

Neither of these establishes a semantic naming relation between the IRI and the graph. Labelled graphs are just graphs which are paired with an IRI in an RDF dataset. That has nothing, prima facia, to do with the IRI denoting or naming the graph. It certainly does not make the IRI into a name for the graph (or even a label for the graph, in any global sense: the 'labeling' is solely restricted to this dataset.) And the need to distinguish between 'identifying' in the REST/HTTP-Web sense, and naming in the semantic sense, has been notorious now for almost a decade: this is exactly what the http-range-14 debates have all been about. They are not the same idea: if we want them to coincide, we have to say so. 

>> I fully understand your bad feeling about labelled graphs. I share
>> it. But we have to accept that, out there, lots of applications are
>> based on that notion, ie, they attach labels to graphs without any
>> sort of further check and assumption. What I am saying is that
>> documenting and making clear the limitation of those states have a
>> value. And application may choose to stay on that level. We may (we
>> probably should...) try to promote the notion of identified graphs as
>> being more 'webby', hence we need to define them properly, but that
>> is defining some sort of an ideal world...
> 
> This is a promising way forward.
> 
>>> 
>>>> My non-semantics dataset view talks about labeling only.
>>>> 'Indexing' may be another term.
>>>> 
>>>> I come back to the quad store example. I do not believe that quad
>>>> stores make any assumption, by default, to the behaviour of the
>>>> URI-s in the 4th column, they are just 'there'.
>>> 
>>> Fine. But when they also occur in the (say) 3rd position, do they
>>> or do they not then mean the same as they meant when they occur in
>>> the fourth position? (Or maybe: does what they mean in the 3rd
>>> position have any relationship at all to their role while
>>> being-there in the fourth position?)
>> 
>> If the quad store implements a labelled graph than the answer is no.
>> More exactly, an application should have no assumption that they do.

OK, but others seem to disagree. I doubt we can ever get widespread agreement on this. People *will* assume that there is a relationship; and moreover, this assumption is extremely natural. 

>> 
>>> The answer seems to be, sometimes they do (of course) and sometimes
>>> they don't (of course), but nothing records which case is which.
>> 
>> Right.

>> A quad store, or an application thereof, might declare that it
>> uses labelled or identified graphs. Well.. probably should/must and
>> not might.

How can it declare this? If we say it SHOULD, or even that it MAY, do this, we have to provide a standard way to do it, that everyone can recognize. 

>> 
>> 
>>> And I object to that situation, as it produces faux-RDF which is
>>> designed to be systematically ambiguous in meaning.
>> 
>> And I hear your objection. Today an application or a quad store has
>> no means to say which way it goes. Hence the mess... That is the
>> absolute minimal step that I would like to make to try to clarify
>> things.

Seems we agree on this :-)

Pat

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 23 December 2011 18:01:35 UTC