- From: Alex Hall <alexhall@revelytix.com>
- Date: Fri, 8 Apr 2011 10:35:57 -0400
- To: nathan@webr3.org
- Cc: "Eric Prud'hommeaux" <eric@w3.org>, RDF WG <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>
- Message-ID: <BANLkTimNCXjjrTS8OVY4w8NaRTTi9VQGkQ@mail.gmail.com>
On Thu, Apr 7, 2011 at 5:06 PM, Nathan <nathan@webr3.org> wrote: > Eric Prud'hommeaux wrote: > >> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400] >> >>> Here is a proposal of a semantics for named graphs in RDF. My goals here >>> are to: >>> - Extend >>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to >>> go beyond (IRI, graph) tuples. >>> - Give something that is formally defined enough to serve as a starting >>> point for discussion. >>> - Specify common semantics for multi-graph serialization formats, or at >>> least a starting point. >>> - Specify something that is flexible enough to satisfy applications that >>> want to treat named graphs as either g-snaps or g-boxes. >>> >>> Regarding g-boxes, I specifically want to avoid incorporating anything >>> that >>> suggests time variance into the semantics, because specifying semantics >>> for >>> temporal changes is explicitly out of scope for the existing RDF >>> Semantics >>> document. >>> >>> Here goes... >>> >>> 1. Graph Identification >>> Let I be an IRI. Define Graph(I) as a unary predicate such that Graph(I) >>> implies that the resource identified by I is an RDF graph. If desired, >>> this >>> can be described easily enough in RDF by defining a new class rdfs:Graph >>> and >>> mapping Graph(I) to the triple I rdf:type rdfs:Graph. >>> >> >> Perhaps you didn't intend to take a position on this, but does >> "resource identified by I" imply http dererencing? I can think of >> three interpretations of Graph: >> >> 0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G >> 0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map >> 0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map >> >> Of course these definitions presume a position on whether two graphs >> with the same triples == each other. Perhaps "I is an RDF graph" says >> to use "is" and not "==". >> > > This is pretty much just like REST, and a bit unsurprising since we stuck > RDF on the web, started to dereference it, put the results in a quad store > where the IRI of resource we dereferenced was said to be the graph > identifier and the triples extracted from the representation were put in the > remaining 3 slots. Quads born, SPARQL, QuadStores, RDF Datasets, full circle > and back to here and now. > > RDFParse(HTTPGet(I)) == G > > doesn't entail that I is a Graph or that I == G, since: > > *- I is mapped to (or associated with) a set of values over time > - a successful HTTPGet(I) returns a single value from the set (not the > current value) > - the values in the set are representations > - each representation may have multiple properties and contain multiple > things (RDF triples being just one of those things, like in RDFa documents) > - RDFParse(HTTPGet(I)) == a g-snap. > > pretty much the most we can conclude, regardless of the interface (HTTP or > otherwise), is that the g-snap (representing a set of triples) is associated > with I. > > if we say any more than that, we're just creating use case specific stories > to aid comprehension for people, but at the same time we're creating > artificial limitations which will inevitably come back to bite us, or > somebody, in the ass later. > > side: I put the * beside the first item so that you can see the REST and > HTTP story falls apart when they try to speak of anything behind the > interface, because instinct and context here makes you think that the values > which I is associated with over time are rdf graphs / sets of triples; and > the context in HTTP/REST makes you think it's associated with a set of > representations over time, or some resource that has a mythical state. Any > attempt to say what is behind the interface or what the relationship is just > makes problems later when you zoom out and consider other concepts. So IMHO, > just best to keep it as simple as possible. > > if RDFParse(HTTPGet(<I>)) results in a g-snap Z, then: > g-snap Z is associated with <I> > <I> is dereferencable by HTTP > an HTTP GET on <I> at time X resulted in representation Y > representation Y is associated with <I> > Y contains Z > No objections from me on this one. To paraphrase, what you're saying is: lots of IRIs, which identify resources other than graphs, are dereferenceable on the web and will return a graph, or more precisely a document from which a graph may be extracted. The graph represents that resource, but we can't say that the graph *is* the resource. What follows is thinking out loud... I suppose this is particularly relevant in the Linked Data community, where I might dereference the IRI <I> = <http://example.com/people#Alice> and find a graph G = { <http://example.com/people#Alice> foaf:givenName "Alice"; foaf:knows :Bob; ... } If we want to turn around and associate <I> with G in a quad store or RDF dataset, we should be able to just do that without implicating that the resource identified by <I> is that graph, or having to allocate some new IRI to represent the notion of "the graph that describes Alice". That does raise a bunch of questions for me surrounding provenance (maybe they've already been answered, I'm not familiar with research in that area), like how to differentiate the description of <I> as a web document (was retrieved on this date, etc.) from the description of <I> as an abstract resource (the person Alice, in this case)? Is it worthwhile inventing a vocabulary to define the notion of "the graph that describes Alice"? These tend to lead down a path that I would describe as graph reification, and not having done the appropriate research there I think I'll refrain from further comment... -Alex > > Best, > Nathan > > > Define G(I) as a function that returns the RDF graph identified by I. In >>> our parlance, G(I) is a g-snap, invariant over time. Due to the nature >>> of >>> RDF, it is difficult to express the relationship between I and G(I) >>> natively >>> in RDF. Graph literals, which I understand to be the encoding of some >>> set >>> of triples as a single node in a graph, are one possible approach but >>> this >>> proposal does not attempt to define graph literals. Furthermore, in the >>> open world it's not possible to have complete knowledge of all the >>> triples >>> in G(I) for any given I. >>> >> >> In order to make explicit the difference between the mapping and the >> graph name convention below, I'm using "GraphAt" for "G": >> >> 1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G >> 1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map >> 1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map >> >> and define 0 in terms of 1: >> >> 0. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G >> >> >> 2. Graph Assertion >>> Let I be an IRI and G be an RDF graph. Define GA(I, G) as a binary >>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G. >>> >> >> 3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I) >> ⊢ G >> >> or >> >> 3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G >> >> >> The notion of graph assertion attempts to capture the semantics of what >>> happens when some set of triples is associated with a graph IRI in a >>> multi-graph serialization such as TriG. So the TriG fragment: >>> >>> :G1 { :a :b :c } . >>> >>> would be understood to construct a graph G with a single triple :a :b :c >>> and >>> then make the assertion GA(:G1 G). >>> >>> The use of "entails" as opposed to "equals" here is what gives us our >>> flexibility. Applications that want to treat named graphs as g-snaps, >>> completely described by the triples associated with the graph IRI, can do >>> so >>> by extending (b) to say G(I) equals G instead of entails. Because every >>> graph entails itself, this extension is supported by these semantics, but >>> this would not be required behavior. Indeed, this could lead to trouble >>> in >>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2. >>> >> >> This might be awkward for tests cases 'cause >> <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough >> simplification >> is a valid parse of >> { <pi> <numericValue> 3.14 . } >> I wonder if there's some way to move this beyond parsing (no suggestions >> yet). >> >> >> Applications that want to treat named graphs as g-boxes would to so by >>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G. >>> This aligns pretty closely with my understanding of the notion of graph >>> store from SPARQL 1.1 Update. Poking the g-box to obtain content (either >>> a >>> g-text serialization or query results) amounts to asserting GA(I, G) for >>> the >>> current value of G at some point in time. Given a new graph assertion >>> for >>> an IRI that is already mapped in the store, an implementation could >>> replace >>> the currently mapped graph with the new one (effectively discarding all >>> prior graph assertions) or merge them at its discretion; either approach >>> would be supported by these semantics. >>> >> >> I guess this argues for 1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a >> local map >> where we don't try to tell one person's GA to match another's. >> >> >> Any vocabulary for specifying graph literals and attaching them to a >>> graph >>> IRI in RDF would be defined as making a graph assertion, not setting the >>> value of the identified graph. >>> >>> 3. RDF Datasets >>> I haven't thought this part through entirely, but I think these semantics >>> could be aligned with the existing notion of RDF datasets from SPARQL >>> (and >>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the >>> dataset to the appropriate graph assertions. >>> >>> 4. Graph Equality >>> Because it is not the case that (G1 entails G and G2 entails G) implies >>> G1 = >>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and >>> I2 >>> are the same graph. Such a conclusion could be reached if you extend the >>> definition of GA to mean equals instead of entails as discussed before, >>> but >>> again that is an extension and not part of the proposed semantics. >>> >>> 5. Empty Graphs >>> Because every graph trivially entails the empty graph E, the assertion >>> GA(I, >>> E) is trivially true for every graph IRI I. Making that assertion >>> doesn't >>> do anything beyond identify the resource denoted by I as a graph. >>> >>> 6. Graph Merges >>> It follows from the definition of GA (and the definition of entails) that >>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)). I think this >>> gives >>> us a pretty straightforward approach to merging of RDF datasets if this >>> is >>> required of the spec. >>> >>> Hope you find this useful... or at least that this stirs up some >>> interesting debate. >>> >> >> thanks for moving this forward. >> >> >> Regards, >>> Alex >>> >> >> >
Received on Friday, 8 April 2011 14:36:25 UTC