Re: rdfs:Graph ? comment on http://www.w3.org/TR/rdf11-concepts/#section-dataset and issue 35 from Sandro Hawke on 2013-09-17 (www-archive@w3.org from September 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 17 Sep 2013 09:54:28 -0400
To: Pat Hayes <phayes@ihmc.us>
CC: Jeremy Carroll <jjc@syapse.com>, www-archive <www-archive@w3.org>
Message-ID: <52385F14.3030000@w3.org>
On 09/17/2013 02:33 AM, Pat Hayes wrote:
> (Aside. If I just hit "reply to all" on these messages, it automatically includes  <public-rdf-comments@w3.org>, even though this is not listed as a recipient.  /Aside)

(It's not even listed as a CC?   That sounds like a serious mail client 
bug....)

> I think I understand what Jeremy is getting at. If I remember correctly, we had very much this discussion back when we were drafting the original "named graphs" paper. Let me have a stab at explaining it.

And you do a great job, thanks.   I wish I knew where to archive emails 
like this one (and your one a few days ago to David Booth) which are 
exactly what I'd want to see when I search for "named graph" (or, in the 
case of your email to David, "model theory").

Anyway, building on what you say, let me push back a bit, although I'm 
largely agreeing with you.

Some of your analogies are to digital things, not physical things, and 
those don't actually give us a handle to get out of our messy unwanted 
entailment.    If we serialize some RDF Graph as the turtle character 
string "<http://example.com/a> <http://example.com/b> 
<http://example.com/c>." those 70 characters (or the 70 bytes which 
represent them in UTF-8) are in a sense more concrete than the graph 
itself, but they are still abstract enough to have the property we're 
trying to get away from: anything you say about them relates to anything 
I say about them, because there is only one "them".

I think physical metaphors which appeal strongly to our spacial 
intuition allow us to straighten this out, since humans seem very clear 
that the same physical object can't be in two places at once. When that 
turtle string is my hard drive and your hard drive at the same time, 
that must mean we have two copies (each of which can now have their own 
properties).     And my CPU has two copies, perhaps, at memory locations 
0x0400 and 0x0500.    (In many programming languages, we need to think 
about this a lot.)     And course the copy in a file /tmp/demo1 is not 
the same thing as the copy in /tmp/demo2.  The bytes are the same, the 
characters are the same, the string is the same, but they are in 
different "files", and the files have different properties.

Which brings me back to your notion of "surfaces"...    It seems to me 
the word "surface" has very strong physical connotations, so it brings 
with it strong intuitions which we can use to reach consensus on certain 
logical properties.  If we inscribe the same shape (eg the upper case 
Roman "A") on two different surfaces, clearly there are properties of 
the shape itself, which are properties of it on every surface, and those 
are clearly different from the properties of the surfaces themselves.   
   We also get an interesting third notion: the properties of 
inscriptions -- the markings of the shape on one surface vs another.    
I guess this is what you're getting at in the conclusion of your email.

So, I love the idea of thinking about each graph-label in a dataset as 
denoting a surface, and the triples in the graph associated with that 
label in that dataset as the shapes inscribed on that surface.   That's 
what you have in mind here, right?

Further, I think it's a very nice model of the web, to think of URLs as 
denoting surfaces.   When I do a GET on that URL, the Internet tells me 
what's written on that surface.

Are we in agreement on this?

When I picture surfaces in general, I picture a hard smooth chunk of 
material, perhaps 1-2 sq ft., maybe pottery.   Usually a fragment of a 
large sphere.   (Did you draw them curved in your 2009 ISWC keynote?)    
When I picture web surfaces, I find they've turned into CRTs (which have 
the same kind of curve), because I know full well web pages sometimes 
change in the blink of an eye.    This change doesn't violate any of my 
deep intuitions about surfaces -- the surface of a CRT is still a 
surface -- it's just a complicated, sometimes-changing surface.   
Similarly, some surfaces have privacy screens so you can only see them 
from some angles, and some even have complex privacy screens so they 
look different from different angles (like those toys where the horse 
appears to run as you change the viewing angle).

So, my "boxes" and your "surfaces" are very similar.   The difference 
suggests that perhaps you tend to think of information being written on 
pages and I think of it being stored in databases.   With boxes, it's 
more likely there's stuff hiding in the corner you're not going to see 
until you search for it or "dump" out everything in the box.   And 
surfaces can naturally be read-only (a page in a book) or read-write (a 
chalkboard); in contrast, it's a bit of a stretch to imagine a box that 
always has the same contents (what you called a 'fixed g-box' and I call 
a 'static g-box').    As a programmer, I'm quite comfortable with the 
idea that some memory locations are read-only (in fact, the first 
computer on which I did machine language programming, as a child, had 
RAM from 0x0000 to 0x9FFF and then ROM from 0xA000 to 0xFFFF, as I 
recall), but I'll admit that's not the mainstream notion of a "box".

So, I'll stop here and wait for feedback that we're on the same page 
about this, before thinking about what to do about it.

       -- Sandro



> There are (speaking intuitively) RDF graphs all over the internet, represented using RDF surface syntaxes. RDF/XML documents, Trig documents, quad stores, etc. etc.. But these things are not, strictly speaking, graphs, even if we ignore the fact that they can be modified. Lets assume that they are all cast in stone, so they are not g-boxes. Still, they aren't graphs, because two of them can be different and yet describe the very same graph. So what are they, exactly? They are things that bear the same relation to graphs that a token of the letter "A" bears to the first letter of the English alphabet. Or, they are things that bear the same relation to graphs that actual physical copies of Moby Dick bear to the novel written by Melville. Or, they are things that bear the same relation to graphs that RDF classes bear to the sets that are their extensions. To all intents and purposes, they are just like the the more abstract things, but there can be many of them corresponding to each one of those. They are exemplars, tokens, concretions, intensions, representations, ... choose your favorite analogy... of graphs.
>
> When we wote the named graph paper, we wanted the names to name these things rather than "abstract" graphs, because these are the things that one can store, transmit, copy and generally do processing on. These are the actual RDF data, and the RDF graphs are a kind of abstraction of them, something like the parse tree of a sentence as opposed to a copy of the sentence in an actual document. So we needed a way to define these things, but it wasnt easy to do that in a philosophically elegant way. So we used the quick and slightly dirty construction of pairing the graph with its name to represent the particular thing that the name names. This way, one can have two or more named graphs which are distinct, each has its own name distinct from the other names, but they are all copies of (tokens of, intensions of) the same actual graph. And this simple trick avoids the consequence that we wanted to avoid, and which your example illustrates, which is that if we really were naming the graphs, then my name for my graph would also become a name for your graph if you happened to have a copy of my graph. Which does not sit well with the idea of having deferencable names like URLs.
>
> So, to emphasize, this is not going all the way to g-boxes. The idea was not to give a name to a box which just kind of happens to have a graph in it, or something with a state which can be changed with time. It was to give a name to the actual graph-like things, but with the understanding that we are talking about something more like actual datastructures, or actual documents, than mathematical abstractions. Things that can be put on websites, stored in a file at at an address, given copy protections and pointed at using IRIs. Those are the graph-things that the names of named graphs were supposed to be naming.
>
> Given the g-box discussion, we could identify these things with 'fixed' g-boxes whose state is not allowed to change, but I am less happy with this convention, as the g-box idea introduces the whole business of temporality, state change and so on, which is a huge can of worms that really is not relevant to the "intensional" notion that Jeremy is talking about here. So to introduce all this, then to immediately cancel it by saying the box if 'fixed', is confusing, and conceptual overkill. Personally, I like the letter-A analogy, and would be very happy to have the notion of a token of a graph, being any datastructure or document which encodes or parses to the graph. But not something with a state, not labile or dynamic, just as fixed and eternal as any other RDF notion. And if we do this, then we have a three-way relationship between a name, the graph token it names, and the graph exemplified by the graph token, and we can run your account of datasets without mentioning boxes or implying anything about change and time. Just replace "g-box" with "graph token" (or whatever we decide to call it. It is, of course, a named graph using the conventions from the original paper.) And then your g1/g2 example entailment does not hold, as I think it should not.
>
> Pat
>
> On Sep 16, 2013, at 5:19 PM, Jeremy J Carroll wrote:
>
>>
>>
>> On Sep 11, 2013, at 8:14 PM, Sandro Hawke <sandro@w3.org> wrote:
>>
>>> On 09/11/2013 06:21 PM, Jeremy J Carroll wrote:
>>>> This section defines a vocabulary item rdf:Graph in addition to those in [RDF-SCHEMA].
>>>> This is the class of resources that are RDF graphs. If a resource in this class is identified by an IRI, and that IRI is used to name a graph in a dataset, then within that dataset the resource SHOULD correspond to the named graph.
>>> Does it not follow from this definition that:
>>>
>>>     PREFIX : <http://example.org/#>
>>>     PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>     :g1 :p 1.
>>>     :g1 a rdf:Graph.
>>>     :g2 a rdf:Graph.
>>>     GRAPH :g1 { :a :b :c }
>>>     GRAPH :g2 { :a :b :c }
>>> entails:
>>>     :g2 :p 1.
>>>
>>> (assuming the "SHOULD" is taken as something we can count on) ?
>>
>> Hi Sandro
>>
>> this is an excellent question, and one that I takes motivates your discussion of box-model on the WG mailing list.
>>
>> I am not very comfortable with a YES, but, given the text I suggested a YES it would be.
>>
>> In essence I think I want an intensional semantics rather than an extensional semantics, suggested text below; I start with philosophical discussion.
>>
>> In maths, we typically refer to Sets with intensional semantics, in RDF we refer to classes with extensional semantics.
> You have this exactly backwards, which is rather confusing :-)
>
>> So if I have a class
>>
>> jjc:Friends rdf:type rdfs:Class ;
>>        rdfs:comment "Jeremy's friends" .
>>
>> and also a class
>>
>> jjc:SandrosFriends rdfs:type rdfs:Class ;
>>        rdfs:comment "Sandro's friends" .
>>
>> in the unlikely event that we have exactly the same friends, RDF semantics does not confuse the intent.
> Right. Classes in RDF might have the same members but still be distinct. SO RDF classes are not mathematical sets. But what is your point here? Is this a problem? (Why?)
>
>> A view would be that RDF Semantics achieves this by moving the semantic intent more to the property rdf:type …
>>
>> So, we could scrub the idea of having a class, and instead define a property.
>>
>> An alternative proposed modification, which clarifies my desired NO to your entailment
>>
>> [[
>> 3.7 The rdf:namesGraph property
>>
>> This section defines a vocabulary item rdf:namesGraph in addition to those in [RDF-SCHEMA].
>>
>> rdf:namesGraph is an instance of rdf:Property that is used to state that a resource is a name for a graph.
>>
>> A triple of the form:
>>
>> R rdf:namesGraph G
>>
>> states that G is an RDF graph and R is a name for the graph G.
>> If R is an IRI, and that IRI is used to name a graph in a dataset, then within that dataset the resource G SHOULD correspond to the named graph.
>>
>> The rdfs:domain of rdf:namesGraph is rdfs:Resource. No rdfs:range is specified.
>> ]]
>>
>>
>> ===
>>
>> With this my particular use case to add metadata about the graph as an intensional as opposed to an extensional object would be addressed as follows.
>>
>>      PREFIX : <http://example.org/
>> #>
>>      PREFIX rdf: <
>> http://www.w3.org/1999/02/22-rdf-syntax-ns
>> #>
>>      
>>      GRAPH :g1 { :g1 rdf:namesGraph _:g ; rdfs:comment "An example graph" }
>>     
>>
>>
>>
>> Jeremy J Carroll
>> Principal Architect
>> Syapse, Inc.
>>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 home
> 40 South Alcaniz St.            (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile (preferred)
> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Tuesday, 17 September 2013 13:54:41 UTC