- From: Nathan <nathan@webr3.org>
- Date: Fri, 08 Apr 2011 23:07:30 +0100
- To: Alex Hall <alexhall@revelytix.com>
- CC: Eric Prud'hommeaux <eric@w3.org>, RDF WG <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>
Alex Hall wrote: > On Thu, Apr 7, 2011 at 5:06 PM, Nathan <nathan@webr3.org> wrote: > >> Eric Prud'hommeaux wrote: >> >>> * Alex Hall <alexhall@revelytix.com> [2011-04-07 04:47-0400] >>> >>>> Here is a proposal of a semantics for named graphs in RDF. My goals here >>>> are to: >>>> - Extend >>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal to >>>> go beyond (IRI, graph) tuples. >>>> - Give something that is formally defined enough to serve as a starting >>>> point for discussion. >>>> - Specify common semantics for multi-graph serialization formats, or at >>>> least a starting point. >>>> - Specify something that is flexible enough to satisfy applications that >>>> want to treat named graphs as either g-snaps or g-boxes. >>>> >>>> Regarding g-boxes, I specifically want to avoid incorporating anything >>>> that >>>> suggests time variance into the semantics, because specifying semantics >>>> for >>>> temporal changes is explicitly out of scope for the existing RDF >>>> Semantics >>>> document. >>>> >>>> Here goes... >>>> >>>> 1. Graph Identification >>>> Let I be an IRI. Define Graph(I) as a unary predicate such that Graph(I) >>>> implies that the resource identified by I is an RDF graph. If desired, >>>> this >>>> can be described easily enough in RDF by defining a new class rdfs:Graph >>>> and >>>> mapping Graph(I) to the triple I rdf:type rdfs:Graph. >>>> >>> Perhaps you didn't intend to take a position on this, but does >>> "resource identified by I" imply http dererencing? I can think of >>> three interpretations of Graph: >>> >>> 0a. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ RDFParse(HTTPGet(I)) == G >>> 0b. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ m(I) == G, m a local map >>> 0c. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ M(I) == G, M an RDF-wide map >>> >>> Of course these definitions presume a position on whether two graphs >>> with the same triples == each other. Perhaps "I is an RDF graph" says >>> to use "is" and not "==". >>> >> This is pretty much just like REST, and a bit unsurprising since we stuck >> RDF on the web, started to dereference it, put the results in a quad store >> where the IRI of resource we dereferenced was said to be the graph >> identifier and the triples extracted from the representation were put in the >> remaining 3 slots. Quads born, SPARQL, QuadStores, RDF Datasets, full circle >> and back to here and now. >> >> RDFParse(HTTPGet(I)) == G >> >> doesn't entail that I is a Graph or that I == G, since: >> >> *- I is mapped to (or associated with) a set of values over time >> - a successful HTTPGet(I) returns a single value from the set (not the >> current value) >> - the values in the set are representations >> - each representation may have multiple properties and contain multiple >> things (RDF triples being just one of those things, like in RDFa documents) >> - RDFParse(HTTPGet(I)) == a g-snap. >> >> pretty much the most we can conclude, regardless of the interface (HTTP or >> otherwise), is that the g-snap (representing a set of triples) is associated >> with I. >> >> if we say any more than that, we're just creating use case specific stories >> to aid comprehension for people, but at the same time we're creating >> artificial limitations which will inevitably come back to bite us, or >> somebody, in the ass later. >> >> side: I put the * beside the first item so that you can see the REST and >> HTTP story falls apart when they try to speak of anything behind the >> interface, because instinct and context here makes you think that the values >> which I is associated with over time are rdf graphs / sets of triples; and >> the context in HTTP/REST makes you think it's associated with a set of >> representations over time, or some resource that has a mythical state. Any >> attempt to say what is behind the interface or what the relationship is just >> makes problems later when you zoom out and consider other concepts. So IMHO, >> just best to keep it as simple as possible. >> >> if RDFParse(HTTPGet(<I>)) results in a g-snap Z, then: >> g-snap Z is associated with <I> >> <I> is dereferencable by HTTP >> an HTTP GET on <I> at time X resulted in representation Y >> representation Y is associated with <I> >> Y contains Z >> Hi Alex, > No objections from me on this one. To paraphrase, what you're saying is: > lots of IRIs, which identify resources other than graphs, are > dereferenceable on the web and will return a graph, or more precisely a > document from which a graph may be extracted. The graph represents that > resource, but we can't say that the graph *is* the resource. > > What follows is thinking out loud... > > I suppose this is particularly relevant in the Linked Data community, where > I might dereference the IRI <I> = <http://example.com/people#Alice> and > find a graph G = { <http://example.com/people#Alice> foaf:givenName "Alice"; > foaf:knows :Bob; ... } If we want to turn around and associate <I> with G > in a quad store or RDF dataset, we should be able to just do that without > implicating that the resource identified by <I> is that graph, or having to > allocate some new IRI to represent the notion of "the graph that describes > Alice". Note, you can't dereference an IRI which contains a fragment, only absolute-IRIs are dereferencable. (As in, you'd have to chop of the fragment above and dereference <http://example.com/people>). > That does raise a bunch of questions for me surrounding provenance (maybe > they've already been answered, I'm not familiar with research in that area), > like how to differentiate the description of <I> as a web document (was > retrieved on this date, etc.) from the description of <I> as an abstract > resource (the person Alice, in this case)? Is it worthwhile inventing a > vocabulary to define the notion of "the graph that describes Alice"? RDF pretty much already has that, see http://www.w3.org/TR/rdf-concepts/#section-fragID [[ we assume that the URI part (i.e. excluding fragment identifier) identifies a resource, which is presumed to have an RDF representation. So when eg:someurl#frag is used in an RDF document, eg:someurl is taken to designate some RDF document (even when no such document can be retrieved). ]] document, meaning "Information Resource" as opposed to a chunk of RDF/XML which you GET at a particular time. Hence why we have the 303 status code guidance and httpRange-14 resolution to provide some way of allowing non fragment URIs to be dereferencable. That's a big issue though, which I'm pretty sure not many will appreciate me mentioning around here :p Back to the terminology though, this is where the g-snap, g-text, g-box terminology comes in handy, rather than saying "graph". Terms like Graph and Document have taken on multiple meanings for different people over the years, and are v much conflated / tend to lead people to inferring the wrong things when used innocently in conversation. Best, Nathan > These > tend to lead down a path that I would describe as graph reification, and not > having done the appropriate research there I think I'll refrain from further > comment... > > -Alex > > > >> Best, > > >> Nathan >> >> >> Define G(I) as a function that returns the RDF graph identified by I. In >>>> our parlance, G(I) is a g-snap, invariant over time. Due to the nature >>>> of >>>> RDF, it is difficult to express the relationship between I and G(I) >>>> natively >>>> in RDF. Graph literals, which I understand to be the encoding of some >>>> set >>>> of triples as a single node in a graph, are one possible approach but >>>> this >>>> proposal does not attempt to define graph literals. Furthermore, in the >>>> open world it's not possible to have complete knowledge of all the >>>> triples >>>> in G(I) for any given I. >>>> >>> In order to make explicit the difference between the mapping and the >>> graph name convention below, I'm using "GraphAt" for "G": >>> >>> 1a. GraphAt(I:IRI):RDFGraph = G ∣ RDFParse(HTTPGet(I)) == G >>> 1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a local map >>> 1c. GraphAt(I:IRI):Boolean = G ∣ M(I) == G, M an RDF-wide map >>> >>> and define 0 in terms of 1: >>> >>> 0. Graph(I:IRI):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) == G >>> >>> >>> 2. Graph Assertion >>>> Let I be an IRI and G be an RDF graph. Define GA(I, G) as a binary >>>> predicate such that GA(I, G) implies (a) Graph(I) and (b) G(I) entails G. >>>> >>> 3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ Graph(I) && GraphAt(I) >>> ⊢ G >>> >>> or >>> >>> 3. GA(I:IRI, G:RDFGraph):Boolean = ∃ G:RDFGraph ∣ GraphAt(I) ⊢ G >>> >>> >>> The notion of graph assertion attempts to capture the semantics of what >>>> happens when some set of triples is associated with a graph IRI in a >>>> multi-graph serialization such as TriG. So the TriG fragment: >>>> >>>> :G1 { :a :b :c } . >>>> >>>> would be understood to construct a graph G with a single triple :a :b :c >>>> and >>>> then make the assertion GA(:G1 G). >>>> >>>> The use of "entails" as opposed to "equals" here is what gives us our >>>> flexibility. Applications that want to treat named graphs as g-snaps, >>>> completely described by the triples associated with the graph IRI, can do >>>> so >>>> by extending (b) to say G(I) equals G instead of entails. Because every >>>> graph entails itself, this extension is supported by these semantics, but >>>> this would not be required behavior. Indeed, this could lead to trouble >>>> in >>>> the open world where you can have GA(I, G1) and GA(I, G2) with G1 != G2. >>>> >>> This might be awkward for tests cases 'cause >>> <X> log:implies { <pi> <numericValue> 3.14, 3.0 . } . # the Bolslough >>> simplification >>> is a valid parse of >>> { <pi> <numericValue> 3.14 . } >>> I wonder if there's some way to move this beyond parsing (no suggestions >>> yet). >>> >>> >>> Applications that want to treat named graphs as g-boxes would to so by >>>> essentially maintaining a (time-sensitive) mapping of IRI I to graph G. >>>> This aligns pretty closely with my understanding of the notion of graph >>>> store from SPARQL 1.1 Update. Poking the g-box to obtain content (either >>>> a >>>> g-text serialization or query results) amounts to asserting GA(I, G) for >>>> the >>>> current value of G at some point in time. Given a new graph assertion >>>> for >>>> an IRI that is already mapped in the store, an implementation could >>>> replace >>>> the currently mapped graph with the new one (effectively discarding all >>>> prior graph assertions) or merge them at its discretion; either approach >>>> would be supported by these semantics. >>>> >>> I guess this argues for 1b. GraphAt(I:IRI):Boolean = G ∣ m(I) == G, m a >>> local map >>> where we don't try to tell one person's GA to match another's. >>> >>> >>> Any vocabulary for specifying graph literals and attaching them to a >>>> graph >>>> IRI in RDF would be defined as making a graph assertion, not setting the >>>> value of the identified graph. >>>> >>>> 3. RDF Datasets >>>> I haven't thought this part through entirely, but I think these semantics >>>> could be aligned with the existing notion of RDF datasets from SPARQL >>>> (and >>>> as proposed on the wiki) by simply mapping the (IRI, graph) tuples in the >>>> dataset to the appropriate graph assertions. >>>> >>>> 4. Graph Equality >>>> Because it is not the case that (G1 entails G and G2 entails G) implies >>>> G1 = >>>> G2, it is also not the case that (GA(I1, G) and GA(I2, G)) implies I1 and >>>> I2 >>>> are the same graph. Such a conclusion could be reached if you extend the >>>> definition of GA to mean equals instead of entails as discussed before, >>>> but >>>> again that is an extension and not part of the proposed semantics. >>>> >>>> 5. Empty Graphs >>>> Because every graph trivially entails the empty graph E, the assertion >>>> GA(I, >>>> E) is trivially true for every graph IRI I. Making that assertion >>>> doesn't >>>> do anything beyond identify the resource denoted by I as a graph. >>>> >>>> 6. Graph Merges >>>> It follows from the definition of GA (and the definition of entails) that >>>> (GA(I, G1) and GA(I, G2)) implies GA(I, Merge(G1, G2)). I think this >>>> gives >>>> us a pretty straightforward approach to merging of RDF datasets if this >>>> is >>>> required of the spec. >>>> >>>> Hope you find this useful... or at least that this stirs up some >>>> interesting debate. >>>> >>> thanks for moving this forward. >>> >>> >>> Regards, >>>> Alex >>>> >>> >
Received on Friday, 8 April 2011 22:08:40 UTC