Re: [Graphs] Proposal for Named Graph Semantics from Nathan on 2011-04-08 (public-rdf-wg@w3.org from April 2011)

From: Nathan <nathan@webr3.org>
Date: Fri, 08 Apr 2011 23:07:30 +0100
To: Alex Hall <alexhall@revelytix.com>
CC: Eric Prud'hommeaux <eric@w3.org>, RDF WG <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>
Message-ID: <4D9F8722.9090606@webr3.org>
Alex Hall wrote:
> On Thu, Apr 7, 2011 at 5:06 PM, Nathan <nathan@webr3.org> wrote:
> 
>> Eric Prud'hommeaux wrote:
>>
>>> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400]
>>>
>>>> Here is a proposal of a semantics for named graphs in RDF.  My goals here
>>>> are to:
>>>> - Extend
>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
>>>> go beyond (IRI, graph) tuples.
>>>> - Give something that is formally defined enough to serve as a starting
>>>> point for discussion.
>>>> - Specify common semantics for multi-graph serialization formats, or at
>>>> least a starting point.
>>>> - Specify something that is flexible enough to satisfy applications that
>>>> want to treat named graphs as either g-snaps or g-boxes.
>>>>
>>>> Regarding g-boxes, I specifically want to avoid incorporating anything
>>>> that
>>>> suggests time variance into the semantics, because specifying semantics
>>>> for
>>>> temporal changes is explicitly out of scope for the existing RDF
>>>> Semantics
>>>> document.
>>>>
>>>> Here goes...
>>>>
>>>> 1. Graph Identification
>>>> Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
>>>> implies that the resource identified by I is an RDF graph.  If desired,
>>>> this
>>>> can be described easily enough in RDF by defining a new class rdfs:Graph
>>>> and
>>>> mapping Graph(I) to the triple I rdf:type rdfs:Graph.
>>>>
>>> Perhaps you didn't intend to take a position on this, but does
>>> "resource identified by I" imply http dererencing? I can think of
>>> three interpretations of Graph:
>>>
>>>  0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G
>>>  0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map
>>>  0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map
>>>
>>> Of course these definitions presume a position on whether two graphs
>>> with the same triples == each other. Perhaps "I is an RDF graph" says
>>> to use "is" and not "==".
>>>
>> This is pretty much just like REST, and a bit unsurprising since we stuck
>> RDF on the web, started to dereference it, put the results in a quad store
>> where the IRI of resource we dereferenced was said to be the graph
>> identifier and the triples extracted from the representation were put in the
>> remaining 3 slots. Quads born, SPARQL, QuadStores, RDF Datasets, full circle
>> and back to here and now.
>>
>>  RDFParse(HTTPGet(I)) == G
>>
>> doesn't entail that I is a Graph or that I == G, since:
>>
>> *- I is mapped to (or associated with) a set of values over time
>> - a successful HTTPGet(I) returns a single value from the set (not the
>> current value)
>> - the values in the set are representations
>> - each representation may have multiple properties and contain multiple
>> things (RDF triples being just one of those things, like in RDFa documents)
>> - RDFParse(HTTPGet(I)) == a g-snap.
>>
>> pretty much the most we can conclude, regardless of the interface (HTTP or
>> otherwise), is that the g-snap (representing a set of triples) is associated
>> with I.
>>
>> if we say any more than that, we're just creating use case specific stories
>> to aid comprehension for people, but at the same time we're creating
>> artificial limitations which will inevitably come back to bite us, or
>> somebody, in the ass later.
>>
>> side: I put the * beside the first item so that you can see the REST and
>> HTTP story falls apart when they try to speak of anything behind the
>> interface, because instinct and context here makes you think that the values
>> which I is associated with over time are rdf graphs / sets of triples; and
>> the context in HTTP/REST makes you think it's associated with a set of
>> representations over time, or some resource that has a mythical state. Any
>> attempt to say what is behind the interface or what the relationship is just
>> makes problems later when you zoom out and consider other concepts. So IMHO,
>> just best to keep it as simple as possible.
>>
>>  if RDFParse(HTTPGet(<I>)) results in a g-snap Z, then:
>>   g-snap Z is associated with <I>
>>   <I> is dereferencable by HTTP
>>   an HTTP GET on <I> at time X resulted in representation Y
>>   representation Y is associated with <I>
>>   Y contains Z
>>

Hi Alex,

> No objections from me on this one.  To paraphrase, what you're saying is:
> lots of IRIs, which identify resources other than graphs, are
> dereferenceable on the web and will return a graph, or more precisely a
> document from which a graph may be extracted.  The graph represents that
> resource, but we can't say that the graph *is* the resource.
> 
> What follows is thinking out loud...
> 
> I suppose this is particularly relevant in the Linked Data community, where
> I might dereference the IRI  <I> = <http://example.com/people#Alice> and
> find a graph G = { <http://example.com/people#Alice> foaf:givenName "Alice";
> foaf:knows :Bob; ... }  If we want to turn around and associate <I> with G
> in a quad store or RDF dataset, we should be able to just do that without
> implicating that the resource identified by <I> is that graph, or having to
> allocate some new IRI to represent the notion of "the graph that describes
> Alice".

Note, you can't dereference an IRI which contains a fragment, only 
absolute-IRIs are dereferencable. (As in, you'd have to chop of the 
fragment above and dereference <http://example.com/people>).

> That does raise a bunch of questions for me surrounding provenance (maybe
> they've already been answered, I'm not familiar with research in that area),
> like how to differentiate the description of <I> as a web document (was
> retrieved on this date, etc.) from the description of <I> as an abstract
> resource (the person Alice, in this case)?  Is it worthwhile inventing a
> vocabulary to define the notion of "the graph that describes Alice"?

RDF pretty much already has that, see 
http://www.w3.org/TR/rdf-concepts/#section-fragID

[[
we assume that the URI part (i.e. excluding fragment identifier) 
identifies a resource, which is presumed to have an RDF representation. 
So when eg:someurl#frag is used in an RDF document, eg:someurl is taken 
to designate some RDF document (even when no such document can be 
retrieved).
]]

document, meaning "Information Resource" as opposed to a chunk of 
RDF/XML which you GET at a particular time.

Hence why we have the 303 status code guidance and httpRange-14 
resolution to provide some way of allowing non fragment URIs to be 
dereferencable.

That's a big issue though, which I'm pretty sure not many will 
appreciate me mentioning around here :p

Back to the terminology though, this is where the g-snap, g-text, g-box 
terminology comes in handy, rather than saying "graph". Terms like Graph 
and Document have taken on multiple meanings for different people over 
the years, and are v much conflated / tend to lead people to inferring 
the wrong things when used innocently in conversation.

Best,

Nathan

> These
> tend to lead down a path that I would describe as graph reification, and not
> having done the appropriate research there I think I'll refrain from further
> comment...
> 
> -Alex
> 
> 
> 
>> Best,
> 
> 
>> Nathan
>>
>>
>>  Define G(I) as a function that returns the RDF graph identified by I.  In
>>>> our parlance, G(I) is a g-snap, invariant over time.  Due to the nature
>>>> of
>>>> RDF, it is difficult to express the relationship between I and G(I)
>>>> natively
>>>> in RDF.  Graph literals, which I understand to be the encoding of some
>>>> set
>>>> of triples as a single node in a graph, are one possible approach but
>>>> this
>>>> proposal does not attempt to define graph literals.  Furthermore, in the
>>>> open world it's not possible to have complete knowledge of all the
>>>> triples
>>>> in G(I) for any given I.
>>>>
>>> In order to make explicit the difference between the mapping and the
>>> graph name convention below, I'm using "GraphAt" for "G":
>>>
>>>  1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G
>>>  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
>>>  1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map
>>>
>>> and define 0 in terms of 1:
>>>
>>>  0.  Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G
>>>
>>>
>>>  2. Graph Assertion
>>>> Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
>>>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.
>>>>
>>>  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I)
>>> ⊢ G
>>>
>>> or
>>>
>>>  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G
>>>
>>>
>>>  The notion of graph assertion attempts to capture the semantics of what
>>>> happens when some set of triples is associated with a graph IRI in a
>>>> multi-graph serialization such as TriG.  So the TriG fragment:
>>>>
>>>> :G1 { :a :b :c } .
>>>>
>>>> would be understood to construct a graph G with a single triple :a :b :c
>>>> and
>>>> then make the assertion GA(:G1 G).
>>>>
>>>> The use of "entails" as opposed to "equals" here is what gives us our
>>>> flexibility.  Applications that want to treat named graphs as g-snaps,
>>>> completely described by the triples associated with the graph IRI, can do
>>>> so
>>>> by extending (b) to say G(I) equals G instead of entails.  Because every
>>>> graph entails itself, this extension is supported by these semantics, but
>>>> this would not be required behavior.  Indeed, this could lead to trouble
>>>> in
>>>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.
>>>>
>>> This might be awkward for tests cases 'cause
>>>  <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough
>>> simplification
>>> is a valid parse of
>>>  { <pi> <numericValue> 3.14 . }
>>> I wonder if there's some way to move this beyond parsing (no suggestions
>>> yet).
>>>
>>>
>>>  Applications that want to treat named graphs as g-boxes would to so by
>>>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
>>>>  This aligns pretty closely with my understanding of the notion of graph
>>>> store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either
>>>> a
>>>> g-text serialization or query results) amounts to asserting GA(I, G) for
>>>> the
>>>> current value of G at some point in time.  Given a new graph assertion
>>>> for
>>>> an IRI that is already mapped in the store, an implementation could
>>>> replace
>>>> the currently mapped graph with the new one (effectively discarding all
>>>> prior graph assertions) or merge them at its discretion; either approach
>>>> would be supported by these semantics.
>>>>
>>> I guess this argues for   1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a
>>> local map
>>> where we don't try to tell one person's GA to match another's.
>>>
>>>
>>>  Any vocabulary for specifying graph literals and attaching them to a
>>>> graph
>>>> IRI in RDF would be defined as making a graph assertion, not setting the
>>>> value of the identified graph.
>>>>
>>>> 3. RDF Datasets
>>>> I haven't thought this part through entirely, but I think these semantics
>>>> could be aligned with the existing notion of RDF datasets from SPARQL
>>>> (and
>>>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
>>>> dataset to the appropriate graph assertions.
>>>>
>>>> 4. Graph Equality
>>>> Because it is not the case that (G1 entails G and G2 entails G) implies
>>>> G1 =
>>>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and
>>>> I2
>>>> are the same graph.  Such a conclusion could be reached if you extend the
>>>> definition of GA to mean equals instead of entails as discussed before,
>>>> but
>>>> again that is an extension and not part of the proposed semantics.
>>>>
>>>> 5. Empty Graphs
>>>> Because every graph trivially entails the empty graph E, the assertion
>>>> GA(I,
>>>> E) is trivially true for every graph IRI I.  Making that assertion
>>>> doesn't
>>>> do anything beyond identify the resource denoted by I as a graph.
>>>>
>>>> 6. Graph Merges
>>>> It follows from the definition of GA (and the definition of entails) that
>>>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this
>>>> gives
>>>> us a pretty straightforward approach to merging of RDF datasets if this
>>>> is
>>>> required of the spec.
>>>>
>>>> Hope you find this useful...  or at least that this stirs up some
>>>> interesting debate.
>>>>
>>> thanks for moving this forward.
>>>
>>>
>>>  Regards,
>>>> Alex
>>>>
>>>
>
Received on Friday, 8 April 2011 22:08:40 UTC