Re: Problem with auto-generated fragment IDs for graph names from Pat Hayes on 2013-02-20 (public-rdf-wg@w3.org from February 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 20 Feb 2013 13:55:59 -0600
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, 'Andy Seaborne' <andy.seaborne@epimorphics.com>, public-linked-json@w3.org, 'RDF-WG' <public-rdf-wg@w3.org>
Message-Id: <B50CBF86-F8CD-4DA3-B92C-EF4E63D408D7@ihmc.us>
On Feb 20, 2013, at 6:52 AM, Eric Prud'hommeaux wrote:

> * Pat Hayes <phayes@ihmc.us> [2013-02-19 00:01-0600]
>> 
>> On Feb 18, 2013, at 12:07 PM, Eric Prud'hommeaux wrote:
>> 
>>> * Markus Lanthaler <markus.lanthaler@gmx.net> [2013-02-18 17:58+0100]
>>>> On Monday, February 18, 2013 5:17 PM, Andy Seaborne wrote:
>>>>> _:0x1234 {
>>>>>     x:assertions x:expressedAs x:triples .
>>>>>   }
>>>>> 
>>>>> is a labelling of a graph (value).
>>>>> 
>>>>> So there is some relationship (not here defined) to the graph and that
>>>>> is in the dataset structure.  In your previous message you talked about
>>>>> "navigate" and "bnode identifiers".  I understood your description as
>>>>> structural navigation of a datastructure from parsing.  Was that right?
>>>> 
>>>> Yes.
>>>> 
>>>> 
>>>>> You get would get from  _:0x1234 to the graph by looking in the dataset
>>>>> structure (which is a map) if bnodes were allowed.  At this level, of
>>>>> concrete graph structures, bnode label or a IRI string would serve the
>>>>> same purpose using e.g. relative URIs (and a per-parse random base URI
>>>>> making it only findable locally).  It's a local structural identifier.
>>>> 
>>>> Yes. Would that also be the case if bNodes would *not* denote the graph they
>>>> label? As I understand it, if bNodes wouldn't denote the graph, you couldn't
>>>> look up a graph labeled with a bNode ID in a dataset because you wouldn't
>>>> know if that bNode ID denotes that graph or not. Is that correct?
>>> 
>>> Aha! Would "does not formally denote the graph" mean there's no usable
>>> mapping from label to graph?
>> 
>> No. It means that when the IRI is used in an RDF triple, what it refers to might not be what you get by following the label-to-graph mapping.
> 
> Let's see if I can rectify with the operational semantics to which I seem so attached:
> 
> In order to merge to sources of RDF data, I need to know that IRIs in that data refer to the same things, e.g. a person, an event, some process... In order to encourage responsible use of IRIs, we say "Any IRI or literal denotes some thing in the universe of discourse."
> 
> In order to maintain referential integrity within some named graph system, we need to assert that there's a mapping from graph label to graph. It would be nice to say that graph labels denote the associated graph, but:
>  1 graph names frequently denote something other than the graph:
>    <protein:p53> { <protein:p53> a :Protien . }
> 
>  2 different systems use the same name to label different graphs, sometimes differing by some inferential closure:
>    system1: <protein:p53> { <protein:p53> a :Protien . }
>    system2: <protein:p53> { <protein:p53> a :Protien, :NucleotideSequence . }
> 
> Doesn't punning already mean that IRIs have to denote a union of things in domain of discourse?

Well, several things, yes. But punning only works when the syntax determines which referent to 'select' in every case, eg aaa rdf:type bbb requires bbb to be (interpreted as) a class and aaa to be interpreted as an individual, so it still works when you write aaa rdf:type aaa. But in this case, we don't get that nice syntactic clue. If I see <protein:p53> in an RDF triple, how do I know whether its supposed to mean the protein or the graph? And without that, punning is just plain old ambiguity.

>  <X> a owl:Class; owl:ObjectProperty .
> Can the graph <protein:p53> and the protein itself be puns?

No, see above.

> How about two different graphs with materialized different inferences?
> 
> 
>>> I believe we can factor out whether
>>> bnodes are permitted as graph labels as this question is arises in
>>> either case.
>>> 
>>> 
>>>> If you have the following dataset:
>>>> 
>>>> {
>>>> _:b1 x:signature "... signature ..." .
>>>> }
>>>> _:b1 {
>>>> ... some triples ...
>>>> }
>>>> 
>>>> Do the two _:b1 above refer to the same, i.e., the named graph? Does this
>>>> mean that "... signature ..." is the signature of the graph labeled with
>>>> _:b1? Or could it be that the signature is about something completely
>>>> different?
>>> 
>>> Yeah, it'd really be useless if the system were permitted to have _:b1
>>> (or even <http://a.example/graphs/b1>, for that matter) refer to
>>> something other than the graph which was paired with that signature.
>>> Operationally, I can't imagine this happening. I'm sure that every
>>> test case and every implementer will make sure that parsing that
>>> dataset as a Trig document or constructing it with SPARQL or RIF or
>>> SPIN or OWL will preserve "graph integrity".
>> 
>> ? What do you mean by "graph integrity"? That sounds like some kind of syntactic condition, but the issue isn't anything to do with syntax. It isn't something that any kind of parser could possibly fix. 
> 
> I just mean that when I say
>  { <g1> dc:author "Bob" . }
>  <g1> { :theMoon :madeOf :greenCheese }
> , <g1> doesn't end up labeling some *other* graph or no graph at all. I quoted "graph integrity" to mean the abstract notion of maintaining referential integrity across a graph of things which happened to already include the word "graph".

OK, got it. I have no problem with using one label to label two graphs, myself, but we decided thats illegal, which is also fine by me.

> 
> 
>>> (Note that this "graph"
>>> is a superset of an RDF graph.)
>>> 
>>> I don't know how to utter that in the semantics doc 'cause I don't
>>> know what "denotes" means.
>> 
>> <knocks head repeatedly against wall>
>> 
>> A denotes B =
>> A refers to B (except we don't mean 'refer' as in "dereferencing a pointer") = 
>> A names B (except "name" as in "named graph" has been defined to mean something else) =
>> When you use A to talk about something, you are talking about B
>> 
>> Got it now, Eric? 
> 
> probably. the text above will have revealed that by now.

Still not sure... :-)

> 
> 
>>> The semantics that we're trying to avoid
>>> implying is that dataset1's graph <foo> is the same as dataset2's
>>> graph <foo>.
>> 
>> Are we? That is news to me. But this surely isn't the main issue here. 
> 
> I believe this was one of the sticking points. Perhaps my example above clarifies.

No doubt we both have selective memories. I have to say, if I was aware that this was the central issue, I would have argued for bnodes used as labels from day one. Its bad enough that IRIs label without denoting, but when that labelling is contextual, using IRIs is even more problematic. The whole point of IRIs is that they are global, right? Isnt that one of the foundation posts of the entire Web? So why are we using IRIs for graph labels *at all*?? 

>>> (Some systems may make such a promise, but it's not
>>> generally required of e.g. deployed linked data or SPARQL systems.)
>>> All the semantics has to capture is that for a given dataset, there is
>>> a map from graph label to graph. I suspect we don't want to go a
>>> step further and say that the mapping is 1:1 because of:
>>> 
>>>   {
>>>     <b1> dc:author "Bob" .
>>>     <b2> dc:author "Bob" .
>>>     <b1> owl:sameAs <b2> .
>>>   }
>>>   <b1> { ... some triples ... }
>>>   <b2> { ... some triples ... }
>> 
>> How is this a problem? The IRIs in the first two triples have nothing to do with their use as graph labels immediately below. You could have written this:
>> 
>>  {
>>     <b1> dc:author "Bob" .
>>     <b2> dc:author "Bob" .
>>     <b1> owl:sameAs <b2> .
>>   }
>>   <c1> { ... some triples ... }
>>   <c2> { ... some triples ... }
>> 
>> and it would have the same effective meaning. There is nothing in the RDF specs which could lead any reader of your example to know that b1 refers to a graph. 
> 
> Fair point. What if the predicate were :graphAuthor, with a definition like "identifies an author of the graph labeled by the subject"?

The semantics requires that the truth of  :foo :graphAuthor :baz . depends upon what :foo and :baz denote, ie on I(:foo) and I(:baz). We have decided (and you repeated, above) that I(:foo) might not be the graph that is labelled by the subject. So, that "definition" is in violation of the semantics. For similar reasons, you can't define a class of all things whose IRI has an even number of characters, or the relation that holds between two things when their IRIs rhyme when pronounced in Icelandic. The property is between the things, not the IRIs. The only thing a subject is allowed to do in RDF is refer to something, and the RDF is then talking about that referent. 

Pat


> 
> 
>>> I suspect that saying
>>> 
>>>   Within a dataset, a graph node label denotes a graph.
>>>  Graph node labels may appear as subjects or objects in graphs.
>>> 
>>> would do the trick
>> 
>> Not really, since there is no idea in RDF of denoting something "in a dataset" or indeed "in" anything. That would require RDF to be some kind of context logic, where the same name can refer to one thing in one context and something else in a different context. RDF just isn't designed that way. 
> 
> I thought we were driven here by the fact (if I recall correctly) that different systems used graph labels to identify graphs with different content. I remember noting that this meant that a clever SPARQL rewriter couldn't rewrite
>  WHERE {
>    GRAPH <X> { <s> <p> ?o }
>    SERVICE <someService> { ?o <x> ?y }
>  }
> 
> to push the unification to someService:
>  WHERE {
>    SERVICE <someService> {
>      GRAPH <X> { <s> <p> ?o }
>      ?o <x> ?y
>    }
>  }
> without some extra assurances that they had the same contents for <X>. I think I called these "high semantics" (assurance that graphs are the same) and "low semantics" (no promises) during a call.

I don't recall this. Maybe I wasnt on that call, or wasnt following the discussion very acutely at that point. Sorry :-(

Pat


> 
> 
>> Several people have suggested it should be. For example:
>> 
>> http://www.w3.org/2004/12/rules-ws/paper/98/ (1998)
>> http://www.ninebynine.org/RDFNotes/RDFContexts.html (2000)
>> http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-166/70.pdf
>> http://www.cs.uic.edu/~ifc/SWDB/papers/Tazari.pdf
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2012Mar/0086.html
>> http://www.slideshare.net/PatHayes/rdf-with-contexts (2012)
>> 
>> But the WG has not elected to take up this idea, so RDF right now does not have contexts. 
>> 
>>> , but again, I don't understand what drove us from
>>> "denotes" to "is paired with".
>> 
>> See previous comment. 
>> 
>> Pat
>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> -ericP
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 20 February 2013 19:56:39 UTC