- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 10 Oct 2011 16:07:09 +0100
- To: Sandro Hawke <sandro@w3.org>
- CC: public-rdf-wg@w3.org
On 10/10/11 03:23, Sandro Hawke wrote: > On Sat, 2011-10-08 at 17:31 +0100, Andy Seaborne wrote: >> >> On 07/10/11 15:35, Sandro Hawke wrote: >>> On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote: >>>>> Okay, that's enough for now. Give me a +1 if you think this is headed >>>> > in a useful direction. >>>> >>>> I like something like this as a pattern of good practice (well, 2 >>>> patterns). I don't agree with forcing the 4th column to have a specific >>>> meaning given all the other deployed uses we have now collected. >>> >>> Yeah.... There is a middle ground where some datasets use Web >>> semantics and some don't. I see your point that we can't just force >>> people to change -- we can't say the thingsthey've been saying now means >>> something else. >>> >>> Maybe we can have a way to flag which datasets are using Web semantics, >>> and allow market pressures to work? Like, where we do a new mime type >>> for a multigraph syntax, we could add this. And maybe it's something >>> we can flag in SPARQL service description. >>> >>>> On one points: >>>> >>>> I don't see why >>>> >>>> <http://example.org> {<s> <p> <o> . } >>>> >>>> should mean it is ONLY that triple rather than CONTAINS that triple. If >>>> the data publisher wants to say "and that's all" then they should say so >>>> as an additional fact. The converse of "it's closed by default" is >>>> harder to see how to allow it to be open sometimes. >>>> >>>> For a large graph, and you only need to talk about a small subset, the >>>> deployment issues. Consider dbpedia. >>>> >>>> (I also want to see the same change in TriG for concatenation of files) >>> >>> It seems to me that it's easy to go from complete to incomplete, just >>> using a subgraph predicate. Let's say we want to say G1 is the graph >>> with only<s> <p> <o> and G2 is a graph with that triple and maybe other >>> stuff. I'd say: >>> >>> G1 {<s> <p> <o>. } >>> { G1 r:subgraphOf G2. } >>> >>> But I don't see how to communicate G1 the way you're talking about. How >>> do you say "and that's all"? >> >> >> >> G1 {<s> <p> <o>. } >> { G1 r:representationOf G2. } > > I don't understand. Can you expand those out in English? I read > that, with your proposed subgraph semantics as: > > G1 is a graph which contains at least the triple<s> <p> <o>. > > G1 is a representation of G2. > > I don't know what "representation" means in this sense, but in any case, > how can we know from those statements that G2 contains only that one > triple? The only connection between that triple and G2 is via G1, and > we've made that connection so loose that it can't serve this purpose. You were asking how to say "and that's all" if G1 { } is some triples from. I am suggesting a statement that the {} corresponds to the AWWW respresentation. > >> Mindful of DanBri comments dereference being non-global > > Which I disposed of. Only by saying "don't do it" and coming up with the idea of "true datasets". But the reality is that it can happen without the client knowing. If a "true dataset" is one particular case, then fine - it's a pattern which is, tome, the best way forward. If "true dataset" is only case then I disagree with the approach and think not reflecting current usage is a pointless exercise in spec-ery. > The cases of dereference being non-global are not > useful to this purpose, so we're not using them (in my proposal). > (Note that non-global dereference, while useful for some things, is > anathema to much of the Web. Search engines can't really handle it, > etc.) > >> maybe all that >> can ever be said is "subgraph" so making the >> >> G1 {<s> <p> <o>. } >> >> case the subgraph case may be where AWWW leads us. > > I'm quote confident it doesn't. > >> Stronger statements >> need additional triples to make them and this reflect that fact that >> additional knowledge over and above AWWW deref is being used. > > No. When I load a Web over the web, it's very clear to me, at a > protocol level, when I've gotten the full > "representation" (serialization) of the page, of an image, of a video, > of a stylesheet, etc. > > Also, when I download an RDF/XML or Turtle file, I can tell when I'm > done. We want to be able to support merging of graphs but that doesn't > mean have to pretend the boundaries between the graphs, pre-merging, > don't exist. And the whole utility of "named graphs" for some folks > (eg Tim Lebo) is that it lets you draw boundaries around graphs and > point to them. > > -- Sandro > > >> Andy >> >> >>> >>> -- Sandro >>> >>> >>>> Andy >>>> >>>> On 07/10/11 03:04, Sandro Hawke wrote: >>>>> Here's a proposal for what the fourth column should mean. It's kind of >>>>> obvious, and I think it's how many of us just assumed Named Graphs were >>>>> supposed to work. But I don't think it's been written down in a form >>>>> we can use, so here it is, in a first draft. >>>>> >>>>> I haven't really tried to motivate this, but one thing it does is allow >>>>> folks to refer to a graphs using just one URI. As [1] points out rather >>>>> painfully, as things stand now, you need multiple URIs just to identify >>>>> each g-box (and thus g-snap). (That is, you need to say which sparql >>>>> endpoint you're talking about, and then which graph within its >>>>> dataset.) >>>>> >>>>> My starting question was: what is the relationship between the IRI (the >>>>> "graph name") and its associated g-snap in an RDF Dataset. This >>>>> applies to the dataset backing any SPARQL end point, as well as the >>>>> dataset serialized in any multigraph syntax, like TriG or N-Quads. >>>>> Another way to look at it: what does it mean to assert a TriG >>>>> document? If you send me the TriG Document "<a> {<s> <p> <o> }", and >>>>> I trust you, what do I now know? >>>>> >>>>> Richard, I think, has been arguing for a minimalist position, >>>>> answering "nothing", or "it depends on out-of-band agreements". This >>>>> "Web Semantics" proposal is an alternative. >>>>> >>>>> === Proposal >>>>> >>>>> The idea here is to make the relationship between the URI and the >>>>> graph be the standard Web naming relationship, similar to what we all >>>>> use for Web pages. When you dereference the URI, you get the graph. >>>>> >>>>> This has the feature of being, to some extent, observable. Just like >>>>> triples are claims about some domain of discourse, quads become claims >>>>> about idealized Web dereference behavior. >>>>> >>>>> Specifically: Consider a "graph naming" to be the association of a >>>>> graph name N with a graph G. For the graph naming to hold, every >>>>> successful dereference of N yielding an RDF graph must yield G. For a >>>>> dataset D to hold, its default graph must hold (as normal in RDF) and >>>>> every graph naming pair in D must hold. >>>>> >>>>> Example 1: This dataset >>>>> >>>>> <http://example.org> {<s> <p> <o>. } >>>>> >>>>> means that if anyone is able to dereference "http://example.org" >>>>> and obtain an RDF graph serialization, the serialized graph will >>>>> consist of the single triple,<s> <p> <o>. Failure to dereference >>>>> does not make the graph naming untrue, but a successful dereference >>>>> yielding a different graph does. >>>>> >>>>> Example 2: This dataset can never be true: >>>>> >>>>> <http://example.org> {<s> <p> 1. } >>>>> <HTTP://example.org> {<s> <p> 2. } >>>>> >>>>> ... since one cannot get different results dereferencing URIs that >>>>> differ only in the case of the scheme component (as per RFC 3986). >>>>> >>>>> Example 3: This dataset: >>>>> >>>>> <tag:hawke.org,2010-10-06:eg1> {<s> <p> <o>. } >>>>> >>>>> cannot be tested using Web protocols, since the "tag" URI scheme is >>>>> (by design) not dereferenceable. Whether it is true or false cannot >>>>> be determined experimentally. >>>>> >>>>> ==== Temporal Context >>>>> >>>>> How can we say: >>>>> >>>>> <http://example.org> {<s> <p> <o>. } >>>>> >>>>> if we suspect that "http://example.org" might serve some other content >>>>> tomorrow? >>>>> >>>>> The answer is that datasets often need temporal qualification just >>>>> like RDF graphs do. It's just like saying in RDF: >>>>> >>>>> <http://example.org/Alice> foaf:age 25. >>>>> >>>>> One solution for foaf:age triples is to include triples like: >>>>> <> dc:temporal "2011-10-06"^^xs:dateTime. >>>>> >>>>> and that can be done in datasets as well, using the default graph. >>>>> More work is needed on this, but I'm pretty sure this proposal can use >>>>> whatever solution people come up with for RDF and doesn't make matters >>>>> much worse than they are already. >>>>> >>>>> ==== Practical Deployment Choices >>>>> >>>>> Any system which maintains a dataset (eg a sparql endpoint) or >>>>> generates multigraph documents like TriG has to do one (or more) of >>>>> the following: >>>>> >>>>> 1. Use new non-dereferenceable graph names. These could be tag or >>>>> uuid URIs, or http URIs in your own name space which you choose to >>>>> leave 404. >>>>> >>>>> 2. Use your own dereferenceable graph names, perhaps relative to the >>>>> endpoint or TriG document URI. If you do serve RDF content at >>>>> those URIs, it MUST be the same content (give or take stated time >>>>> lag). >>>>> >>>>> 3. Use someone else's graph names. Here, the key thing is temporal >>>>> metadata. You have to decide what you want (copy once vs >>>>> synchronize with what accuracy) and (somehow) share that temporal >>>>> metadata. >>>>> >>>>> >>>>> ... >>>>> >>>>> Okay, that's enough for now. Give me a +1 if you think this is headed >>>>> in a useful direction. >>>>> >>>>> -- Sandro >>>>> >>>>> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts >>>>> >>>>> >>>> >>>> >>> >>> >> > >
Received on Monday, 10 October 2011 15:07:48 UTC