Re: [Graphs] Proposal for Named Graph Semantics from Alex Hall on 2011-04-07 (public-rdf-wg@w3.org from April 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Thu, 7 Apr 2011 13:05:47 -0400
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-ID: <BANLkTi=1BpXimcKU9gV3siZtYZ5JG1tYow@mail.gmail.com>
On Thu, Apr 7, 2011 at 10:28 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400]
> > Here is a proposal of a semantics for named graphs in RDF.  My goals here
> > are to:
> > - Extend
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to
> > go beyond (IRI, graph) tuples.
> > - Give something that is formally defined enough to serve as a starting
> > point for discussion.
> > - Specify common semantics for multi-graph serialization formats, or at
> > least a starting point.
> > - Specify something that is flexible enough to satisfy applications that
> > want to treat named graphs as either g-snaps or g-boxes.
> >
> > Regarding g-boxes, I specifically want to avoid incorporating anything
> that
> > suggests time variance into the semantics, because specifying semantics
> for
> > temporal changes is explicitly out of scope for the existing RDF
> Semantics
> > document.
> >
> > Here goes...
> >
> > 1. Graph Identification
> > Let I be an IRI.  Define Graph(I) as a unary predicate such that Graph(I)
> > implies that the resource identified by I is an RDF graph.  If desired,
> this
> > can be described easily enough in RDF by defining a new class rdfs:Graph
> and
> > mapping Graph(I) to the triple I rdf:type rdfs:Graph.
>
> Perhaps you didn't intend to take a position on this, but does
> "resource identified by I" imply http dererencing? I can think of
> three interpretations of Graph:
>
>  0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G
>  0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map
>  0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map
>

You're correct that I did not intend to take a position on HTTP
dereferencing.  My personal preference is against including that as part of
the spec.  It would be easy enough to specify HTTP dereferencing in 0a. as
an extension of the general mapping in 0b. by simply defining m(I) =
RDFParse(HTTPGet(I)).

I don't have a strong preference between 0b. and 0c. at the moment, so I'll
defer taking a position on this.


>
> Of course these definitions presume a position on whether two graphs
> with the same triples == each other. Perhaps "I is an RDF graph" says
> to use "is" and not "==".
>

Specifying graphs as g-snaps seems to imply that two graphs with the same
triples == each other -- as expressed in [
http://www.w3.org/2011/rdf-wg/wiki/GraphConceptTerminology]: "If two g-snaps
have the same nodes/arcs, they are really the same g-snap."  I'll admit that
I haven't fully thought through what the implications of this would be.
 Given a notion of RDF node equality and a global mapping M as in 0c. I
suppose you could infer I1 = I2 based on M(I1) == M(I2).


>
>
> > Define G(I) as a function that returns the RDF graph identified by I.  In
> > our parlance, G(I) is a g-snap, invariant over time.  Due to the nature
> of
> > RDF, it is difficult to express the relationship between I and G(I)
> natively
> > in RDF.  Graph literals, which I understand to be the encoding of some
> set
> > of triples as a single node in a graph, are one possible approach but
> this
> > proposal does not attempt to define graph literals.  Furthermore, in the
> > open world it's not possible to have complete knowledge of all the
> triples
> > in G(I) for any given I.
>
> In order to make explicit the difference between the mapping and the
> graph name convention below, I'm using "GraphAt" for "G":
>
>  1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G
>  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
>  1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map
>

Should 1b and 1c read "GraphAt(I:IRI):RDFGraph ..." (rather than Boolean)?


>
> and define 0 in terms of 1:
>
>  0.  Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G
>
>
> > 2. Graph Assertion
> > Let I be an IRI and G be an RDF graph.  Define GA(I, G) as a binary
> > predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G.
>
>   3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I)
> ⊢ G
>
> or
>
>  3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G
>

Is the existential quantifier necessary here?  Isn't G bound by GA(I:IRI,
G:RDFGraph)?


>
>
> > The notion of graph assertion attempts to capture the semantics of what
> > happens when some set of triples is associated with a graph IRI in a
> > multi-graph serialization such as TriG.  So the TriG fragment:
> >
> > :G1 { :a :b :c } .
> >
> > would be understood to construct a graph G with a single triple :a :b :c
> and
> > then make the assertion GA(:G1 G).
> >
> > The use of "entails" as opposed to "equals" here is what gives us our
> > flexibility.  Applications that want to treat named graphs as g-snaps,
> > completely described by the triples associated with the graph IRI, can do
> so
> > by extending (b) to say G(I) equals G instead of entails.  Because every
> > graph entails itself, this extension is supported by these semantics, but
> > this would not be required behavior.  Indeed, this could lead to trouble
> in
> > the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2.
>
> This might be awkward for tests cases 'cause
>  <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough
> simplification
> is a valid parse of
>  { <pi> <numericValue> 3.14 . }
> I wonder if there's some way to move this beyond parsing (no suggestions
> yet).
>
>
> > Applications that want to treat named graphs as g-boxes would to so by
> > essentially maintaining a (time-sensitive) mapping of IRI I to graph G.
> >  This aligns pretty closely with my understanding of the notion of graph
> > store from SPARQL 1.1 Update.  Poking the g-box to obtain content (either
> a
> > g-text serialization or query results) amounts to asserting GA(I, G) for
> the
> > current value of G at some point in time.  Given a new graph assertion
> for
> > an IRI that is already mapped in the store, an implementation could
> replace
> > the currently mapped graph with the new one (effectively discarding all
> > prior graph assertions) or merge them at its discretion; either approach
> > would be supported by these semantics.
>
> I guess this argues for
>  1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map
> where we don't try to tell one person's GA to match another's.
>

I see your point here, although again I'll defer taking a position on the
matter.  Could we write that:

1b. GraphAt(I:IRI):Boolean = G ∣ ∃ m:GraphMap | m(I) == G

Is this correct?  Does this just confuse the matter?

BTW, thanks for introducing the logical notation.  This is my first stab at
writing such a document, I haven't yet located those keys on my keyboard :-)

-Alex
Received on Thursday, 7 April 2011 17:06:16 UTC