- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 05 Apr 2012 07:31:48 -0400
- To: Arnaud Le Hors <lehors@us.ibm.com>
- Cc: public-rdf-wg <public-rdf-wg@w3.org>
On Wed, 2012-04-04 at 15:05 -0700, Arnaud Le Hors wrote: > Hearing people arguing over whether the statement <u> {<a> <b> <c>} > defines a complete graph or not leads me to wonder whether we > shouldn't recognize that the answer ought to be: "it depends". :-) > > I think Lee explained very effectively how the same statement can be > interpreted differently depending on whether you're doing a GET, PUT, > POST or something similar. > > I already noted that the response Sandro expects from his question > "According to this query, how many triples are in the graph known to > that endpoint as ' ?" is actually based on additional information he > is providing in his question, specifically: the query is limited to > the graph known to that particular endpoint. > > If all you had were the following triples: > >>> <a> <b> 1. > >>> <a> <b> 2. > >>> <a> <b> 3. > without giving any other information about how or where you got them > from and you'd ask: "how many triples are associated with <a>?" I > think the answer would have to be: "it depends". FWIW, I really think you do need to keep the notion of a graph in the question, since SPARQL has the keyword "GRAPH". As in, "How many triples are in the graph associated with <http://g1.example.org>. But since we're not really doing this survey anyway, it probably doesn't matter. > I heard Sandro say that when he dereferences <u> he expects to get all > the triples in <u>. I agree but I think that's a Linked Data view of > the world and it comes from the meaning of GET rather than what you > receive. In another context, retrieved in a different way, what you > receive might mean something else. In the case of a SPARQL query it > could mean "these are all the triples in <u> that this endpoint knows > about". > > So, do we really need to choose one way or the other? Can't we just > leave it to the application to decide whether it defines a complete > graph or not? I see three ways to do this. Which are you suggesting? 1. have two syntaxes. eg trig means complete graphs and n-quads means partial graphs. 2. have two constructs in trig, eg: <u1> { <a> <b> <c> } to mean full graphs and <u2> < <a> <b> <c> ... } to means partial graphs 3. use the same syntax, but let the consumer decide which was meant. I don't like 3 at all, because it doesn't solve the use cases. For isntance, you couldn't have a shared crawler, if the apps using the crawl happened to need complete graphs. The second one is cute, but I think would be very hard to implement; it would force every consumer to deal with both complete and incomplete graphs, at least a bit. Both 1 and 2 raise the issue of how you reflect this difference in the dataset, or in SPARQL. How could you do that? -- Sandro > -- > Arnaud Le Hors - Software Standards Architect - IBM Software Group > > > > > From: Lee Feigenbaum <lee@thefigtrees.net> > To: Ivan Herman <ivan@w3.org>, > Cc: Arnaud Le Hors/Cupertino/IBM@IBMUS, Sandro Hawke > <sandro@w3.org>, public-rdf-wg <public-rdf-wg@w3.org> > Date: 04/04/2012 05:16 AM > Subject: Re: New Proposal (6.1) for GRAPHS > Sent by: Lee Feigenbaum <figtree@gmail.com> > > ______________________________________________________________________ > > > > How do people use TriG in practice today? For us, the choice between > these semantics today is determined externally to a TriG file. That > is, > given foo.trig which contains u1 { a b c }, whether this is all of u1 > or > a subgraph of u1 is determined based on which API or which > command-line > command is used. For example: > > > anzo import foo.trig > > ...interprets what's in the the trig file as subgraphs that get added > to > any existing contents of the graphs. So "a b c" would be added to u1. > > > anzo replace foo.trig > > ...interprets what's in the trig file as complete graphs, and sets > the > contents of the graphs in the repository, overwriting whatever might > already be in the repository as the contents of the graphs. So after > this operation u1 ends up with exactly { a b c } as its contents. > > (Aside: So for us, "import" is basically like doing a POST of the > triples in the trig file to the associated graphs via the SPARQL > Graph > Store Protocol, and "replace" is like doing a PUT.) > > (Aside 2: there are other operations as well, such as "anzo update > --remove" which uses the subgraph semantics and also means that the > triples in question should be removed from the associated graphs in > the > repository.) > > All of which is to say, there are plenty of use cases in our > experience > for both of these semantics. If the standard supported a way to make > these semantics explicit, we would probably support that via some > sort > of generic command ("anzo process"? who knows), but would still let > these existing command line commands override the semantics. We have > plenty of cases in which we export some bit of trig, and then later > on > either use "anzo import" or "anzo replace" based on the situation -- > and > we wouldn't want to have to produce two different trig files for that > situation! (This would be roughly analogous to the way in which the > SPARQL Protocol lets the RDF dataset definition override what's in > the > query, so that queries can easily be reused in different contexts.) > > Lee > > On 4/4/2012 3:18 AM, Ivan Herman wrote: > > > > On Apr 4, 2012, at 07:09 , Arnaud Le Hors wrote: > > > >> Hi Sandro, > >> I have to say that my expectation was similar to Charles's. I guess > it's a matter of deciding whether<u1> {<a> <b> <c> } defines > the<u1> graph in its entirety, as containing one triple, or merely > states that the triple<a> <b> <c> is part of graph<u1>. > >> > >> I'm not saying it should be the latter rather than the former, just > that it's not obvious. > >> See below for more on that. > > > > So let me give my typical W3C answer, ie, trying to find a > compromise:-) > > > > More seriously. The structure offered by Sandro relies on the fact > that the > > > > <u> {<a> <b> <c> } > > > > syntax gets its more precise meaning through a possible > > > > <u> rdf:type rdf:SOMECLASSHERE . > > > > Sandro offered two such classes; isn't possible to have three, one > that makes the graph THE graph, the other that makes it PART OF the > graph? > > > > We can of course have long discussions on which the default is. But > that is a lighter discussion I believe. > > > > Ivan > > > > > > > >> > >> Sandro Hawke<sandro@w3.org> wrote on 04/02/2012 05:57:13 PM: > >> > >>> From: Sandro Hawke<sandro@w3.org> > >>> To: Charles Greer<cgreer@marklogic.com>, > >>> Cc: Charles Greer<Charles.Greer@marklogic.com>, public-rdf-wg > >>> <public-rdf-wg@w3.org> > >>> Date: 04/02/2012 05:57 PM > >>> Subject: Re: New Proposal (6.1) for GRAPHS > >>> > >>> On Mon, 2012-04-02 at 14:00 -0700, Charles Greer wrote: > >>>> Thanks for responding Sandro. I think that what I'm finding > difficult, > >>>> or at least a significant departure from RDF as I have understood > it in > >>>> the past, is that this TRIG document > >>>> > >>>> <u1> {<a> <b> <c> .<d> <e> <f> } > >>>> > >>>> is not equivalent to these n-quads: > >>>> > >>>> <a> <b> <c> <u1>. > >>>> <d> <e> <f> <u1>. > >>>> > >>>> Or rather, you now need a document structure around n-quads as > well in > >>>> order to provide the context in which rdf knows that these > triples, and > >>>> only these triples, constitute the graph<u1>. > >>>> > >>>> I had previously thought that RDF was a data model that didn't > need any > >>>> notion of 'document' to work. I'm not sure how another assertion > that > >>>> > >>>> {<u1> a rdf:Graph } > >>>> > >>>> can assert the boundaries of<u1> unless either the { } syntax > does more > >>>> than it appears to, or the document is a harder scope boundary > than I > >>>> would have expected. If the document has some relationship to > scope, I > >>>> think that should be made explicit. > >>> > >>> Two main points: > >>> > >>> 1. That rdf:Graph declaration is different thing. It changes > how<u1> > >>> relates to the graph, but in a semantic (not syntactic) way. It > can be > >>> in a different document, or deduced by the use of some predicates, > or > >>> known a priori by a data consumer. Knowing it entitles the > consumer to > >>> see that<u1> actually identifies the graph directly, rather than > just > >>> being a label for the graph. This might matter if we also > know<u1> > >>> dc:licence ...SomeLicensingTerms.... Is it the graph that's > licensed, > >>> or something else? There are some use cases that suggests this > >>> distinction is important, but if it turns out not to be, it's not > bad, > >>> people will just not use rdf:Graph declarations much. > >>> > >>> 2. Whether or not your trig example and your n-quads example are > >>> equivalent depends on your reading of n-quads. This extends to > your > >>> reading of SPARQL as well. My understanding is people are > somewhat > >>> informal about this, but they generally do expect that once > they've seen > >>> the whole trig file, or the whole n-quads file, or searched the > whole > >>> SPARQL end point, that they've seen all the triples in the graph > with > >>> that name/label. > >>> > >>> As a social test case, we could tell people this SPARQL query is > run: > >>> > >>> SELECT ?s ?p ?o > >>> WHERE GRAPH<http://g1.example.org> { ?s ?p ?o }. > >>> > >>> and that we got three result bindings back: > >>> > >>> ?s ?p ?o > >>> === === === > >>> <a> <b> 1. > >>> <a> <b> 2. > >>> <a> <b> 3. > >>> > >>> Then we ask them: "According to this query, how many triples are > in the > >>> graph known to that endpoint as 'http://g1.example.org' ?" > >>> > >>> What do you think they'll say? > >>> > >>> I think most folks will say, "Three", even if you ask them to > think > >>> again and be pedantically precise. > >>> > >> > >> I agree that's what they would say but primarily because you said: > "in the graph known to that endpoint" > >> This is a critical element which isn't apparent in a mere statement > like: > >> > >> <u1> {<a> <b> <c> .<d> <e> <f> } > >> > >> Which doesn't say anything about where it comes from and whether > it's complete or not. > >> > >> This being said, I can get used to having it the way you suggest. > Especially when the graph name comes first. If we had: {<a> <b> <c> > .<d> <e> <f> }<u1> I would think differently. > >> -- > >> Arnaud Le Hors - Software Standards Architect - IBM Software Group > >> > >> > >>> I think that means they're using the complete-graph semantics I'm > >>> suggesting. If they were using partial-graph semantics, they'd > have to > >>> say, "Three or more". > >>> > >>> You see what I'm saying? When we have a complete protocol > interaction, > >>> via SPARQL, or transmitting a trig or n-quad files, I think the > usual > >>> assumption is that *all* the triples in the named graph are being > sent, > >>> not just some of them. > >>> > >>> I understand sometimes it would be nice to store/transmit just > part of > >>> some named graph. But, as I discussed in a message a couple of > minutes > >>> ago, I think we have to pick one or the other, and I think the > >>> complete-graph approach is better. It's pretty easy to convey > partial > >>> graphs if we define the complete approach. > >>> > >>> (I suppose if we defined the partial-graph approach we could > transmit > >>> complete graphs by transmitting partial graphs and including a > >>> triple-count as metadata, so you know it's complete. I guess > that > >>> would work, but it seems to me to be optimizing for the > much-less-common > >>> case.) > >>> > >>> Coming back to: > >>> > >>>> I had previously thought that RDF was a data model that didn't > need > >>> any > >>>> notion of 'document' to work. > >>> > >>> Yeah, it depends what you're doing with it. There's a lot you > can do > >>> with RDF without paying any attention to what documents particular > bits > >>> of RDF were found in, but I think most of the Graphs use cases > involve > >>> situations where you do need to pay attention to these document > >>> boundaries. > >>> > >>>> Thanks for your willingness to understand my points --- I'm sure > that my > >>>> formal language will improve over time. > >>> > >>> It's a long process. :-) Interesting, it seems to be helped > by > >>> arguing. > >>> > >>> -- Sandro > >>> > >>>> > >>>> Charles > >>>> > >>>> > >>>> > >>>> On 04/02/2012 08:36 AM, Sandro Hawke wrote: > >>>>> On Thu, 2012-03-29 at 09:25 -0700, Charles Greer wrote: > >>>>>> I really like this solution and it seems to satisfy the use > cases > >>>>>> familiar to me from when I actually worked a lot with RDF in > the wild. > >>>>>> > >>>>>> One thing I'm tripping over though -- The scope of a TRIG > document or > >>>>>> RDF dataset in effect 'closes the world.' Is the idea of > "merge" only > >>>>>> within a TRIG document/dataset? > >>>>>> > >>>>>> I can only see two ways to really assert a graph literal -- > either by > >>>>>> sanctifying the boundaries of a dataset, thereby making merges > with > >>>>>> external data problematic, or by signing bytes. Am I missing > something, > >>>>>> as usual? > >>>>> There's some misunderstanding here, yes. Maybe you can talk > through > >>>>> some particular thing you imagine doing, involving merging and > TriG, and > >>>>> I'll be able to pick it up. From what you've written, I'm > confused. > >>>>> > >>>>> Maybe I can clarifying by translating this TriG document: > >>>>> > >>>>> <u1> {<a> <b> <c> } > >>>>> > >>>>> into this English declaration: > >>>>> > >>>>> The URI 'u1' denotes something, and that thing has > exactly one > >>>>> associated RDF Graph. That associated RDF graph > consists of > >>>>> one RDF triple, which we can write in turtle as "<a> > <b> <c>". > >>>>> > >>>>> So, perhaps it's more clear, now. If you merged that with > another TriG > >>>>> document: > >>>>> > >>>>> <u1> {<a> <b> <d> } > >>>>> > >>>>> Then, trying to accept both documents at onces, you'd be saying: > >>>>> > >>>>> The URI 'u1' denotes something, and that thing has > exactly one > >>>>> associated RDF graph. In one document that associated > graph is > >>>>> claimed to be the RDF triple "<a> <b> <c>", but in > another > >>>>> document that graph is claimed to be the RDF triple > "<a> <b> > >>>>> <d>". > >>>>> > >>>>> So, in this case, you can try to merge the documents, but when > you do, > >>>>> you find there is a contradiction, since there is only allowed > to be one > >>>>> associated graph, but in this case there are two different ones. > >>>>> > >>>>> -- Sandro > >>>>> > >>>>>> Charles > >>>>>> > >>>>>> > >>>>>> On 03/27/2012 07:23 PM, Sandro Hawke wrote: > >>>>>>> I've written up design 6 (originally suggested by Andy) in > more > >>>>>>> detail. I've called in 6.1 since I've change/added a few > details that > >>>>>>> Andy might not agree with. Eric has started writing up how > the use > >>>>>>> cases are addressed by this proposal. > >>>>>>> > >>>>>>> This proposal addresses all 15 of our old open issues > concerning graphs. > >>>>>>> (I'm sure it will have its own issues, though.) > >>>>>>> > >>>>>>> The basic idea is to use trig syntax, and to support the > different > >>>>>>> desired relationships between labels and their graphs via > class > >>>>>>> information on the labels. In particular, according to this > proposal, > >>>>>>> in this trig document: > >>>>>>> > >>>>>>> <u1> {<a> <b> <c> } > >>>>>>> > >>>>>>> ... we only know that<u1> is some kind of label for the RDF > Graph<a> > >>>>>>> <b> <c>, like today. However, in his trig document: > >>>>>>> > >>>>>>> {<u2> a rdf:Graph } > >>>>>>> <u2> {<a> <b> <c> } > >>>>>>> > >>>>>>> we know that<u2> is an rdf:Graph and, what's more, we know > that<u2> > >>>>>>> actually is the RDF Graph {<a> <b> <c> }. That is, > in > >>> this case, we > >>>>>>> know that URL "u2" is a name we can use in RDF to refer to > that g-snap. > >>>>>>> > >>>>>>> Details are here: > http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1 > >>>>>>> > >>>>>>> That page includes answers to all the current GRAPHS issues, > including > >>>>>>> ISSUE-5, ISSUE-14, etc. > >>>>>>> > >>>>>>> Eric has started going through Why Graphs and adding the > examples as > >>>>>>> addressed by Proposal 6.1: > >>>>>>> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs_6.1 > >>>>>>> > >>>>>>> -- Sandro (with Eric nearby) > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > > > > > > ---- > > Ivan Herman, W3C Semantic Web Activity Lead > > Home: http://www.w3.org/People/Ivan/ > > mobile: +31-641044153 > > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > > > > > > >
Received on Thursday, 5 April 2012 11:31:59 UTC