Re: [Graphs] Proposal for Named Graph Semantics from Alex Hall on 2011-04-08 (public-rdf-wg@w3.org from April 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Fri, 8 Apr 2011 10:35:57 -0400
To: nathan@webr3.org
Cc: "Eric Prud'hommeaux" <eric@w3.org>, RDF WG <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>
Message-ID: <BANLkTimNCXjjrTS8OVY4w8NaRTTi9VQGkQ@mail.gmail.com>
On Thu, Apr 7, 2011 at 5:06 PM, Nathan <nathan@webr3.org> wrote:

> Eric Prud'hommeaux wrote:
>
>> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400]
>>
>>> Here is a proposal of a semantics for named graphs in RDF.  My goals here
>>> are to:
>>> - Extend
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
>>> go beyond (IRI, graph) tuples.
>>> - Give something that is formally defined enough to serve as a starting
>>> point for discussion.
>>> - Specify common semantics for multi-graph serialization formats, or at
>>> least a starting point.
>>> - Specify something that is flexible enough to satisfy applications that
>>> want to treat named graphs as either g-snaps or g-boxes.
>>>
>>> Regarding g-boxes, I specifically want to avoid incorporating anything
>>> that
>>> suggests time variance into the semantics, because specifying semantics
>>> for
>>> temporal changes is explicitly out of scope for the existing RDF
>>> Semantics
>>> document.
>>>
>>> Here goes...
>>>
>>> 1. Graph Identification
>>> Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
>>> implies that the resource identified by I is an RDF graph.  If desired,
>>> this
>>> can be described easily enough in RDF by defining a new class rdfs:Graph
>>> and
>>> mapping Graph(I) to the triple I rdf:type rdfs:Graph.
>>>
>>
>> Perhaps you didn't intend to take a position on this, but does
>> "resource identified by I" imply http dererencing? I can think of
>> three interpretations of Graph:
>>
>>  0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G
>>  0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map
>>  0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map
>>
>> Of course these definitions presume a position on whether two graphs
>> with the same triples == each other. Perhaps "I is an RDF graph" says
>> to use "is" and not "==".
>>
>
> This is pretty much just like REST, and a bit unsurprising since we stuck
> RDF on the web, started to dereference it, put the results in a quad store
> where the IRI of resource we dereferenced was said to be the graph
> identifier and the triples extracted from the representation were put in the
> remaining 3 slots. Quads born, SPARQL, QuadStores, RDF Datasets, full circle
> and back to here and now.
>
>  RDFParse(HTTPGet(I)) == G
>
> doesn't entail that I is a Graph or that I == G, since:
>
> *- I is mapped to (or associated with) a set of values over time
> - a successful HTTPGet(I) returns a single value from the set (not the
> current value)
> - the values in the set are representations
> - each representation may have multiple properties and contain multiple
> things (RDF triples being just one of those things, like in RDFa documents)
> - RDFParse(HTTPGet(I)) == a g-snap.
>
> pretty much the most we can conclude, regardless of the interface (HTTP or
> otherwise), is that the g-snap (representing a set of triples) is associated
> with I.
>
> if we say any more than that, we're just creating use case specific stories
> to aid comprehension for people, but at the same time we're creating
> artificial limitations which will inevitably come back to bite us, or
> somebody, in the ass later.
>
> side: I put the * beside the first item so that you can see the REST and
> HTTP story falls apart when they try to speak of anything behind the
> interface, because instinct and context here makes you think that the values
> which I is associated with over time are rdf graphs / sets of triples; and
> the context in HTTP/REST makes you think it's associated with a set of
> representations over time, or some resource that has a mythical state. Any
> attempt to say what is behind the interface or what the relationship is just
> makes problems later when you zoom out and consider other concepts. So IMHO,
> just best to keep it as simple as possible.
>
>  if RDFParse(HTTPGet(<I>)) results in a g-snap Z, then:
>   g-snap Z is associated with <I>
>   <I> is dereferencable by HTTP
>   an HTTP GET on <I> at time X resulted in representation Y
>   representation Y is associated with <I>
>   Y contains Z
>

No objections from me on this one.  To paraphrase, what you're saying is:
lots of IRIs, which identify resources other than graphs, are
dereferenceable on the web and will return a graph, or more precisely a
document from which a graph may be extracted.  The graph represents that
resource, but we can't say that the graph *is* the resource.

What follows is thinking out loud...

I suppose this is particularly relevant in the Linked Data community, where
I might dereference the IRI  <I> = <http://example.com/people#Alice> and
find a graph G = { <http://example.com/people#Alice> foaf:givenName "Alice";
foaf:knows :Bob; ... }  If we want to turn around and associate <I> with G
in a quad store or RDF dataset, we should be able to just do that without
implicating that the resource identified by <I> is that graph, or having to
allocate some new IRI to represent the notion of "the graph that describes
Alice".

That does raise a bunch of questions for me surrounding provenance (maybe
they've already been answered, I'm not familiar with research in that area),
like how to differentiate the description of <I> as a web document (was
retrieved on this date, etc.) from the description of <I> as an abstract
resource (the person Alice, in this case)?  Is it worthwhile inventing a
vocabulary to define the notion of "the graph that describes Alice"?  These
tend to lead down a path that I would describe as graph reification, and not
having done the appropriate research there I think I'll refrain from further
comment...

-Alex



>
> Best,


> Nathan
>
>
>  Define G(I) as a function that returns the RDF graph identified by I.  In
>>> our parlance, G(I) is a g-snap, invariant over time.  Due to the nature
>>> of
>>> RDF, it is difficult to express the relationship between I and G(I)
>>> natively
>>> in RDF.  Graph literals, which I understand to be the encoding of some
>>> set
>>> of triples as a single node in a graph, are one possible approach but
>>> this
>>> proposal does not attempt to define graph literals.  Furthermore, in the
>>> open world it's not possible to have complete knowledge of all the
>>> triples
>>> in G(I) for any given I.
>>>
>>
>> In order to make explicit the difference between the mapping and the
>> graph name convention below, I'm using "GraphAt" for "G":
>>
>>  1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G
>>  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
>>  1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map
>>
>> and define 0 in terms of 1:
>>
>>  0.  Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G
>>
>>
>>  2. Graph Assertion
>>> Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
>>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.
>>>
>>
>>  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I)
>> ⊢ G
>>
>> or
>>
>>  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G
>>
>>
>>  The notion of graph assertion attempts to capture the semantics of what
>>> happens when some set of triples is associated with a graph IRI in a
>>> multi-graph serialization such as TriG.  So the TriG fragment:
>>>
>>> :G1 { :a :b :c } .
>>>
>>> would be understood to construct a graph G with a single triple :a :b :c
>>> and
>>> then make the assertion GA(:G1 G).
>>>
>>> The use of "entails" as opposed to "equals" here is what gives us our
>>> flexibility.  Applications that want to treat named graphs as g-snaps,
>>> completely described by the triples associated with the graph IRI, can do
>>> so
>>> by extending (b) to say G(I) equals G instead of entails.  Because every
>>> graph entails itself, this extension is supported by these semantics, but
>>> this would not be required behavior.  Indeed, this could lead to trouble
>>> in
>>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.
>>>
>>
>> This might be awkward for tests cases 'cause
>>  <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough
>> simplification
>> is a valid parse of
>>  { <pi> <numericValue> 3.14 . }
>> I wonder if there's some way to move this beyond parsing (no suggestions
>> yet).
>>
>>
>>  Applications that want to treat named graphs as g-boxes would to so by
>>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
>>>  This aligns pretty closely with my understanding of the notion of graph
>>> store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either
>>> a
>>> g-text serialization or query results) amounts to asserting GA(I, G) for
>>> the
>>> current value of G at some point in time.  Given a new graph assertion
>>> for
>>> an IRI that is already mapped in the store, an implementation could
>>> replace
>>> the currently mapped graph with the new one (effectively discarding all
>>> prior graph assertions) or merge them at its discretion; either approach
>>> would be supported by these semantics.
>>>
>>
>> I guess this argues for   1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a
>> local map
>> where we don't try to tell one person's GA to match another's.
>>
>>
>>  Any vocabulary for specifying graph literals and attaching them to a
>>> graph
>>> IRI in RDF would be defined as making a graph assertion, not setting the
>>> value of the identified graph.
>>>
>>> 3. RDF Datasets
>>> I haven't thought this part through entirely, but I think these semantics
>>> could be aligned with the existing notion of RDF datasets from SPARQL
>>> (and
>>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the
>>> dataset to the appropriate graph assertions.
>>>
>>> 4. Graph Equality
>>> Because it is not the case that (G1 entails G and G2 entails G) implies
>>> G1 =
>>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and
>>> I2
>>> are the same graph.  Such a conclusion could be reached if you extend the
>>> definition of GA to mean equals instead of entails as discussed before,
>>> but
>>> again that is an extension and not part of the proposed semantics.
>>>
>>> 5. Empty Graphs
>>> Because every graph trivially entails the empty graph E, the assertion
>>> GA(I,
>>> E) is trivially true for every graph IRI I.  Making that assertion
>>> doesn't
>>> do anything beyond identify the resource denoted by I as a graph.
>>>
>>> 6. Graph Merges
>>> It follows from the definition of GA (and the definition of entails) that
>>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)).  I think this
>>> gives
>>> us a pretty straightforward approach to merging of RDF datasets if this
>>> is
>>> required of the spec.
>>>
>>> Hope you find this useful...  or at least that this stirs up some
>>> interesting debate.
>>>
>>
>> thanks for moving this forward.
>>
>>
>>  Regards,
>>> Alex
>>>
>>
>>
>
Received on Friday, 8 April 2011 14:36:25 UTC