Re: Problem with auto-generated fragment IDs for graph names from Pat Hayes on 2013-02-18 (public-rdf-wg@w3.org from February 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 18 Feb 2013 08:27:49 -0600
To: Ivan Herman <ivan@w3.org>
Cc: Eric Prud'hommeaux <eric@w3.org>, Manu Sporny <msporny@digitalbazaar.com>, RDF WG <public-rdf-wg@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-Id: <0671C39D-ADB3-4215-8ABA-BE365F4946E1@ihmc.us>
On Feb 16, 2013, at 6:07 PM, Ivan Herman wrote:

> 
> On Feb 16, 2013, at 11:39 , Pat Hayes <phayes@ihmc.us> wrote:
> 
>> 
>> On Feb 16, 2013, at 7:40 AM, Ivan Herman wrote:
>> 
>>> 
>>> On Feb 15, 2013, at 11:59 , Pat Hayes <phayes@ihmc.us> wrote:
>>>> 
>>>> OK, I am impressed. I wasnt aware that SPARQL allowed variables in graph name position. 
>>>> 
>>>> But let me ask you about this example. You are assuming here that the _:doc1 in the triple in the default graph, and the _:doc1 used as a graph label, refer to the same thing, which is the moon-green-cheese graph, right? What is interesting here is that this assumption seems inevitable when we have a bnode involved, as here, but (the WG has decided) it cannot be assumed when an IRI is used. So this data:
>>>> 
>>>> {ex:doc1 :author "Bob" }
>>>> ex:doc1 {:TheMoon :madeOf :greenCheese }
>>>> 
>>>> does *not* entail that Bob is the author of the graph (since 'ex:doc1' might denote something else, which is what the default graph would be about, and not about the graph.) So this actually gives us a new, Manu-independent, reason to allow bnodes as graph labels in datasets: they provide exactly the missing expressivity that is needed to have the default graph act as genuine metadata.  
>>>> 
>>>> Hmm, I am now feeling like we should re-think our decision here. David, Guus, are you following this? Do I hear a groaning noise yet?
>>>> 
>>> 
>>> 
>>> First of all, I am not sure what 'our decision here' means in this case. I may infer that you want to re-think the 'can a bnode stand as a graph id' one
>> 
>> Yes, that is what I meant. The recent one.
> 
> Ok, thanks for the clarification.
> 
>> 
>>> , but I may also infer that you want to come back on 'graph label ... cannot be assumed when an IRI is used to refer to the graph'. I would be opposed to open the latter, that would mean another 1-2 years of discussion. 
>> 
>> No, I take that to be closed. But my point is, the bnode case (if we allow it) provides a neat way to provide some currently missing expressivity, while not going back or re-opening that earlier decision. So it looks like a win/win.
>> 
>>> 
>>> As for the former: I must admit I do not really follow your reasoning. The way I see is that SPARQL does not define/say more about the referral in the case of blank nodes as for IRI-s, ie, there is no difference. The 
>>> 
>>> {IRI-OR-BNODE :author "Bob" }
>>> IRI-OR-BNODE {:TheMoon :madeOf :greenCheese }
>>> 
>>> pattern meaning also referral seems to be more of a social convention to me rather than anything else; I do not believe SPARQL makes any statement whatsoever for this. What am I missing?
>> 
>> I do not mean to refer to SPARQL here. My point is purely semantic. The old decision about IRI graph labels means that this pattern with an IRI **cannot** be presumed to be saying that Bob is the author of the graph. Maybe that is what the writer had in mind, but its not semantically justified, and in fact it is actually called out in the spec as not semantically justified: so if you read this pattern somewhere, you have no semantic justification for believing that the first use of that IRI does in fact refer to the graph labelled by the second use.
> 
> Agreed. That is what I meant when I said that this is a 'social convention', which is probably not the right term, actually. One could rather say that this is a convention for a (specific class of) applications. And you are right that this is not a SPARQL issue, my bad.
> 
>> That is the consequence of our failure to provide a semantics for graph label IRIs in datasets. The source of that failure was the perceived need, by some members of the WG, to permit IRIs which denote things other than graphs to be used as graph labels.
> 
> Isn't it the backward compatibility with SPARQL that put us into this situation? Ie, that SPARQL does not make any kind of statement on referral?

The question of whether names refer to their graphs is irrelevant to SPARQL, but nothing in SPARQL requires naming to be dissociated from reference. SPARQL in fact calls them "named graphs" and refers to the original use of this term, in a paper by Carroll et. al., which *did* specify that a graph name denote the graph it names (well actually, the pair of the name and the graph, but that isnt relevant here.) Nevertheless, several members of the WG felt very strongly that the term "named graph" should not be understood in this way, because use cases of quad stores required that we allow for the case where a graph name does not refer to the graph. 

>> But, my point is, none of this - neither the original motivation nor the decision - apply to bnodes used as graph labels. So if we allow bnodes to be used as graph labels in a dataset, then we have the possibility to give them the 'missing' but obvious semantics which requires them to denote the graph they label. (What other meaning could that use possibly have?) 
>> 
>>> 
>>> I must say that *if* there is a major semantic difference at this point between a bnode label and a IRI label, that may actually be an argument *not* to reopen this issue either; it would create an inconsistency that nobody would understand, let alone explain to third parties...
>> 
>> I think we already have this problem, in explaining to people why it is that a graph "label" might not denote the graph it labels. The fact that you and Eric and Manu all seem to be unaware of the consequences of this illustrates the problem rather acutely.
> 
> I am sorry Pat, but I do understand the consequences.

Then surely you must see that the example that you and Eric use does not in fact mean what it seems to mean. 

I have to confess to a certain sense of frustration here, because you guys seem to be adopting an attitude which is typical in the WG, that "formal" semantics is meaningless, so anyone can simply impose an intended meaning on RDF by a kind of personal force of will. Look, we have decided, and state explicitly in the Concepts document, that putting an IRI next to a graph in a dataset does NOT imply that the IRI denotes the graph. It then follows that nobody, on seeing a dataset, can have any confidence that any IRI used as a graph label is intended to denote the graph, and therefore that if that IRI is used on some RDF triples, that it is there intended to denote the graph. All talk of "social conventions" or "conventions for a class of applications" or "conventions" for anything is just that, talk. How are these conventions to be conveyed from one place to another? If I find some RDF on the Web, how am I to know what conventions were in the mind of the designer of the application which put them there? How do I know if my applications are the ones that the writer of th RDF had in mind? All I have is the RDF and what the RDF specs say about how it is to be interpreted. And, to repeat, those specs currently warn me explicitly to NOT make the assumption that you are calling a "convention". So I should not make that assumption: to do so would be at best reckless on my part, and probably an outright error. [1] (Imagine a programming language whose documentation warns you that calling a certain function is going to have a random output, or randomly throw an error. Would you use that function call to do arithmetic with?) 

The whole point of having a semantics for an interchange language is to state meaning conventions in the specification, so that anyone can discover what they are. RDF is supposed to carry enough meaning that it can be rationally processed by any agent who reads it, without needing to go back to the (possibly unknown) source and ask them what they intended to mean by it. If the meaning of an RDF graph has to depend upon social or mode-of-use "conventions" then we might as well declare it to simply be a data structure whose meaning is not determined by the specification at all. 

> 
> What I was uneasy with, and I am still uneasy with, is that we would provide a different semantics for bNodes than for IRI-s.

They already have (and always have had) a different semantics. 

> My understanding on what you propose is: a bnode usually means "There is an IRI such that bla bla bla".

No. There is an explicit warning in the 2004 specs to not interpret it that way. It means "something exists such that ..." But that thing that exists might not have any IRI referring to it. 

> However, in this case we would have to say "There is an IRI such that bla bla bla AND that the IRI refers to the graph".

No IRI is involved in this case. Actually this would not change the semantics of IRIs or bnodes at all, but it would stipulate that the new case, of a bnode used as a graph label, should be understood as stating an equation of the form
<bnode> = [ the graph ].  I don't see any problem with this, since this case doesn't occur anywhere else, and it is quite unambiguous. And (unlike the case with an IRI label), there cannot be any other thing for the bnode to refer to, so why would it not denote the graph? I mean, this is the *normal* case. The one that needs special explaining is the one we have now, where an IRI label might not refer to the graph. 

BTW, as an aside, if we don't stipulate this, then why would anyone want to use a bnode as a graph label? If the bnode need not refer to the graph, then such a usage is quite literally meaningless. You could "label" any graph with the bnode without changing what the dataset means. Bear in mind that the semantics of bnodes is specified in the basic RDF semantics.  (This, BTW, is why I resisted Manu's proposal so strongly: it was meaningless, given the semantics for labels that we had been assuming.)

> Would't be the only place where we would attach an extra requirement on the "There is an IRI such that..."? I see that as a kind of inconsistency which seriously bothers me. 

We don't need to mention any IRIs. This entire point is not about IRIs, it is about bnodes. About all you can possibly know about a bnode is that it is not an IRI.

> (Let alone the fact that most in the community look at bnodes as a convenient way of defining some sort of an anonymous URI for a resource and they do not look at the existential nature of things. That may lead to further mess.)

Seems to me that, if correct, this is about as bad a mess as anyone could wish for. I hope you are wrong.

Pat

[1] Consider the following scenario, which I think is not implausible. Suppose we are processing datasets like this example, in which default graphs contain statements fixing authorship of graphs:

<graph label> :author http://rdfweb.org/people/danbri/foaf.rdf .

and some script is putting all this together by extracting the author name, generating an :author triple and inserting it into the merged default graph. But now suppose that someone decides to represent this data in another dataset by cleverly using an IRI identifying a person to label the graph they authored:

{ http://rdfweb.org/people/danbri/foaf.rdf { :a :b :c .}}

Then we will get this:

http://rdfweb.org/people/danbri/foaf.rdf :author http://rdfweb.org/people/danbri/foaf.rdf . 

where the subject is supposed to identify a graph and the object is supposed to represent Dan Brickley, but these are the same IRI. This makes nonsense of RDF under anyone's semantic conventions. 

> 
> 
> To be clear: I am not very happy with the non-referral nature of graph labels, personally, and with my W3C position's hat put down. But, as you say, that is water under the bridge. Having this extra bnode semantics would really look like a kludge to me.

> 
> Sorry...:-)
> 
> Ivan
> 
> 
>> At least, with the bnode case available, we could tell people that using bnode labels does carry the obvious intuitive meaning into the normative semantics, and its a very easy rule to follow even if you don't quite understand why it doesn't work in the IRI case. (The fact is, there is no rational reason for that decision, IMO, but that is water under the bridge now.)
>> 
>> Pat
>> 
>>> 
>>> Ivan
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 18 February 2013 14:28:29 UTC