- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 05 Apr 2012 07:31:48 -0400
- To: Arnaud Le Hors <lehors@us.ibm.com>
- Cc: public-rdf-wg <public-rdf-wg@w3.org>
On Wed, 2012-04-04 at 15:05 -0700, Arnaud Le Hors wrote:
> Hearing people arguing over whether the statement <u> {<a> <b> <c>}
> defines a complete graph or not leads me to wonder whether we
> shouldn't recognize that the answer ought to be: "it depends". :-)
>
> I think Lee explained very effectively how the same statement can be
> interpreted differently depending on whether you're doing a GET, PUT,
> POST or something similar.
>
> I already noted that the response Sandro expects from his question
> "According to this query, how many triples are in the graph known to
> that endpoint as ' ?" is actually based on additional information he
> is providing in his question, specifically: the query is limited to
> the graph known to that particular endpoint.
>
> If all you had were the following triples:
> >>> <a> <b> 1.
> >>> <a> <b> 2.
> >>> <a> <b> 3.
> without giving any other information about how or where you got them
> from and you'd ask: "how many triples are associated with <a>?" I
> think the answer would have to be: "it depends".
FWIW, I really think you do need to keep the notion of a graph in the
question, since SPARQL has the keyword "GRAPH". As in, "How many
triples are in the graph associated with <http://g1.example.org>.
But since we're not really doing this survey anyway, it probably doesn't
matter.
> I heard Sandro say that when he dereferences <u> he expects to get all
> the triples in <u>. I agree but I think that's a Linked Data view of
> the world and it comes from the meaning of GET rather than what you
> receive. In another context, retrieved in a different way, what you
> receive might mean something else. In the case of a SPARQL query it
> could mean "these are all the triples in <u> that this endpoint knows
> about".
>
> So, do we really need to choose one way or the other? Can't we just
> leave it to the application to decide whether it defines a complete
> graph or not?
I see three ways to do this. Which are you suggesting?
1. have two syntaxes. eg trig means complete graphs and n-quads
means partial graphs.
2. have two constructs in trig, eg:
<u1> { <a> <b> <c> } to mean full graphs and
<u2> < <a> <b> <c> ... } to means partial graphs
3. use the same syntax, but let the consumer decide which was meant.
I don't like 3 at all, because it doesn't solve the use cases. For
isntance, you couldn't have a shared crawler, if the apps using the
crawl happened to need complete graphs.
The second one is cute, but I think would be very hard to implement; it
would force every consumer to deal with both complete and incomplete
graphs, at least a bit.
Both 1 and 2 raise the issue of how you reflect this difference in the
dataset, or in SPARQL. How could you do that?
-- Sandro
> --
> Arnaud Le Hors - Software Standards Architect - IBM Software Group
>
>
>
>
> From: Lee Feigenbaum <lee@thefigtrees.net>
> To: Ivan Herman <ivan@w3.org>,
> Cc: Arnaud Le Hors/Cupertino/IBM@IBMUS, Sandro Hawke
> <sandro@w3.org>, public-rdf-wg <public-rdf-wg@w3.org>
> Date: 04/04/2012 05:16 AM
> Subject: Re: New Proposal (6.1) for GRAPHS
> Sent by: Lee Feigenbaum <figtree@gmail.com>
>
> ______________________________________________________________________
>
>
>
> How do people use TriG in practice today? For us, the choice between
> these semantics today is determined externally to a TriG file. That
> is,
> given foo.trig which contains u1 { a b c }, whether this is all of u1
> or
> a subgraph of u1 is determined based on which API or which
> command-line
> command is used. For example:
>
> > anzo import foo.trig
>
> ...interprets what's in the the trig file as subgraphs that get added
> to
> any existing contents of the graphs. So "a b c" would be added to u1.
>
> > anzo replace foo.trig
>
> ...interprets what's in the trig file as complete graphs, and sets
> the
> contents of the graphs in the repository, overwriting whatever might
> already be in the repository as the contents of the graphs. So after
> this operation u1 ends up with exactly { a b c } as its contents.
>
> (Aside: So for us, "import" is basically like doing a POST of the
> triples in the trig file to the associated graphs via the SPARQL
> Graph
> Store Protocol, and "replace" is like doing a PUT.)
>
> (Aside 2: there are other operations as well, such as "anzo update
> --remove" which uses the subgraph semantics and also means that the
> triples in question should be removed from the associated graphs in
> the
> repository.)
>
> All of which is to say, there are plenty of use cases in our
> experience
> for both of these semantics. If the standard supported a way to make
> these semantics explicit, we would probably support that via some
> sort
> of generic command ("anzo process"? who knows), but would still let
> these existing command line commands override the semantics. We have
> plenty of cases in which we export some bit of trig, and then later
> on
> either use "anzo import" or "anzo replace" based on the situation --
> and
> we wouldn't want to have to produce two different trig files for that
> situation! (This would be roughly analogous to the way in which the
> SPARQL Protocol lets the RDF dataset definition override what's in
> the
> query, so that queries can easily be reused in different contexts.)
>
> Lee
>
> On 4/4/2012 3:18 AM, Ivan Herman wrote:
> >
> > On Apr 4, 2012, at 07:09 , Arnaud Le Hors wrote:
> >
> >> Hi Sandro,
> >> I have to say that my expectation was similar to Charles's. I guess
> it's a matter of deciding whether<u1> {<a> <b> <c> } defines
> the<u1> graph in its entirety, as containing one triple, or merely
> states that the triple<a> <b> <c> is part of graph<u1>.
> >>
> >> I'm not saying it should be the latter rather than the former, just
> that it's not obvious.
> >> See below for more on that.
> >
> > So let me give my typical W3C answer, ie, trying to find a
> compromise:-)
> >
> > More seriously. The structure offered by Sandro relies on the fact
> that the
> >
> > <u> {<a> <b> <c> }
> >
> > syntax gets its more precise meaning through a possible
> >
> > <u> rdf:type rdf:SOMECLASSHERE .
> >
> > Sandro offered two such classes; isn't possible to have three, one
> that makes the graph THE graph, the other that makes it PART OF the
> graph?
> >
> > We can of course have long discussions on which the default is. But
> that is a lighter discussion I believe.
> >
> > Ivan
> >
> >
> >
> >>
> >> Sandro Hawke<sandro@w3.org> wrote on 04/02/2012 05:57:13 PM:
> >>
> >>> From: Sandro Hawke<sandro@w3.org>
> >>> To: Charles Greer<cgreer@marklogic.com>,
> >>> Cc: Charles Greer<Charles.Greer@marklogic.com>, public-rdf-wg
> >>> <public-rdf-wg@w3.org>
> >>> Date: 04/02/2012 05:57 PM
> >>> Subject: Re: New Proposal (6.1) for GRAPHS
> >>>
> >>> On Mon, 2012-04-02 at 14:00 -0700, Charles Greer wrote:
> >>>> Thanks for responding Sandro. I think that what I'm finding
> difficult,
> >>>> or at least a significant departure from RDF as I have understood
> it in
> >>>> the past, is that this TRIG document
> >>>>
> >>>> <u1> {<a> <b> <c> .<d> <e> <f> }
> >>>>
> >>>> is not equivalent to these n-quads:
> >>>>
> >>>> <a> <b> <c> <u1>.
> >>>> <d> <e> <f> <u1>.
> >>>>
> >>>> Or rather, you now need a document structure around n-quads as
> well in
> >>>> order to provide the context in which rdf knows that these
> triples, and
> >>>> only these triples, constitute the graph<u1>.
> >>>>
> >>>> I had previously thought that RDF was a data model that didn't
> need any
> >>>> notion of 'document' to work. I'm not sure how another assertion
> that
> >>>>
> >>>> {<u1> a rdf:Graph }
> >>>>
> >>>> can assert the boundaries of<u1> unless either the { } syntax
> does more
> >>>> than it appears to, or the document is a harder scope boundary
> than I
> >>>> would have expected. If the document has some relationship to
> scope, I
> >>>> think that should be made explicit.
> >>>
> >>> Two main points:
> >>>
> >>> 1. That rdf:Graph declaration is different thing. It changes
> how<u1>
> >>> relates to the graph, but in a semantic (not syntactic) way. It
> can be
> >>> in a different document, or deduced by the use of some predicates,
> or
> >>> known a priori by a data consumer. Knowing it entitles the
> consumer to
> >>> see that<u1> actually identifies the graph directly, rather than
> just
> >>> being a label for the graph. This might matter if we also
> know<u1>
> >>> dc:licence ...SomeLicensingTerms.... Is it the graph that's
> licensed,
> >>> or something else? There are some use cases that suggests this
> >>> distinction is important, but if it turns out not to be, it's not
> bad,
> >>> people will just not use rdf:Graph declarations much.
> >>>
> >>> 2. Whether or not your trig example and your n-quads example are
> >>> equivalent depends on your reading of n-quads. This extends to
> your
> >>> reading of SPARQL as well. My understanding is people are
> somewhat
> >>> informal about this, but they generally do expect that once
> they've seen
> >>> the whole trig file, or the whole n-quads file, or searched the
> whole
> >>> SPARQL end point, that they've seen all the triples in the graph
> with
> >>> that name/label.
> >>>
> >>> As a social test case, we could tell people this SPARQL query is
> run:
> >>>
> >>> SELECT ?s ?p ?o
> >>> WHERE GRAPH<http://g1.example.org> { ?s ?p ?o }.
> >>>
> >>> and that we got three result bindings back:
> >>>
> >>> ?s ?p ?o
> >>> === === ===
> >>> <a> <b> 1.
> >>> <a> <b> 2.
> >>> <a> <b> 3.
> >>>
> >>> Then we ask them: "According to this query, how many triples are
> in the
> >>> graph known to that endpoint as 'http://g1.example.org' ?"
> >>>
> >>> What do you think they'll say?
> >>>
> >>> I think most folks will say, "Three", even if you ask them to
> think
> >>> again and be pedantically precise.
> >>>
> >>
> >> I agree that's what they would say but primarily because you said:
> "in the graph known to that endpoint"
> >> This is a critical element which isn't apparent in a mere statement
> like:
> >>
> >> <u1> {<a> <b> <c> .<d> <e> <f> }
> >>
> >> Which doesn't say anything about where it comes from and whether
> it's complete or not.
> >>
> >> This being said, I can get used to having it the way you suggest.
> Especially when the graph name comes first. If we had: {<a> <b> <c>
> .<d> <e> <f> }<u1> I would think differently.
> >> --
> >> Arnaud Le Hors - Software Standards Architect - IBM Software Group
> >>
> >>
> >>> I think that means they're using the complete-graph semantics I'm
> >>> suggesting. If they were using partial-graph semantics, they'd
> have to
> >>> say, "Three or more".
> >>>
> >>> You see what I'm saying? When we have a complete protocol
> interaction,
> >>> via SPARQL, or transmitting a trig or n-quad files, I think the
> usual
> >>> assumption is that *all* the triples in the named graph are being
> sent,
> >>> not just some of them.
> >>>
> >>> I understand sometimes it would be nice to store/transmit just
> part of
> >>> some named graph. But, as I discussed in a message a couple of
> minutes
> >>> ago, I think we have to pick one or the other, and I think the
> >>> complete-graph approach is better. It's pretty easy to convey
> partial
> >>> graphs if we define the complete approach.
> >>>
> >>> (I suppose if we defined the partial-graph approach we could
> transmit
> >>> complete graphs by transmitting partial graphs and including a
> >>> triple-count as metadata, so you know it's complete. I guess
> that
> >>> would work, but it seems to me to be optimizing for the
> much-less-common
> >>> case.)
> >>>
> >>> Coming back to:
> >>>
> >>>> I had previously thought that RDF was a data model that didn't
> need
> >>> any
> >>>> notion of 'document' to work.
> >>>
> >>> Yeah, it depends what you're doing with it. There's a lot you
> can do
> >>> with RDF without paying any attention to what documents particular
> bits
> >>> of RDF were found in, but I think most of the Graphs use cases
> involve
> >>> situations where you do need to pay attention to these document
> >>> boundaries.
> >>>
> >>>> Thanks for your willingness to understand my points --- I'm sure
> that my
> >>>> formal language will improve over time.
> >>>
> >>> It's a long process. :-) Interesting, it seems to be helped
> by
> >>> arguing.
> >>>
> >>> -- Sandro
> >>>
> >>>>
> >>>> Charles
> >>>>
> >>>>
> >>>>
> >>>> On 04/02/2012 08:36 AM, Sandro Hawke wrote:
> >>>>> On Thu, 2012-03-29 at 09:25 -0700, Charles Greer wrote:
> >>>>>> I really like this solution and it seems to satisfy the use
> cases
> >>>>>> familiar to me from when I actually worked a lot with RDF in
> the wild.
> >>>>>>
> >>>>>> One thing I'm tripping over though -- The scope of a TRIG
> document or
> >>>>>> RDF dataset in effect 'closes the world.' Is the idea of
> "merge" only
> >>>>>> within a TRIG document/dataset?
> >>>>>>
> >>>>>> I can only see two ways to really assert a graph literal --
> either by
> >>>>>> sanctifying the boundaries of a dataset, thereby making merges
> with
> >>>>>> external data problematic, or by signing bytes. Am I missing
> something,
> >>>>>> as usual?
> >>>>> There's some misunderstanding here, yes. Maybe you can talk
> through
> >>>>> some particular thing you imagine doing, involving merging and
> TriG, and
> >>>>> I'll be able to pick it up. From what you've written, I'm
> confused.
> >>>>>
> >>>>> Maybe I can clarifying by translating this TriG document:
> >>>>>
> >>>>> <u1> {<a> <b> <c> }
> >>>>>
> >>>>> into this English declaration:
> >>>>>
> >>>>> The URI 'u1' denotes something, and that thing has
> exactly one
> >>>>> associated RDF Graph. That associated RDF graph
> consists of
> >>>>> one RDF triple, which we can write in turtle as "<a>
> <b> <c>".
> >>>>>
> >>>>> So, perhaps it's more clear, now. If you merged that with
> another TriG
> >>>>> document:
> >>>>>
> >>>>> <u1> {<a> <b> <d> }
> >>>>>
> >>>>> Then, trying to accept both documents at onces, you'd be saying:
> >>>>>
> >>>>> The URI 'u1' denotes something, and that thing has
> exactly one
> >>>>> associated RDF graph. In one document that associated
> graph is
> >>>>> claimed to be the RDF triple "<a> <b> <c>", but in
> another
> >>>>> document that graph is claimed to be the RDF triple
> "<a> <b>
> >>>>> <d>".
> >>>>>
> >>>>> So, in this case, you can try to merge the documents, but when
> you do,
> >>>>> you find there is a contradiction, since there is only allowed
> to be one
> >>>>> associated graph, but in this case there are two different ones.
> >>>>>
> >>>>> -- Sandro
> >>>>>
> >>>>>> Charles
> >>>>>>
> >>>>>>
> >>>>>> On 03/27/2012 07:23 PM, Sandro Hawke wrote:
> >>>>>>> I've written up design 6 (originally suggested by Andy) in
> more
> >>>>>>> detail. I've called in 6.1 since I've change/added a few
> details that
> >>>>>>> Andy might not agree with. Eric has started writing up how
> the use
> >>>>>>> cases are addressed by this proposal.
> >>>>>>>
> >>>>>>> This proposal addresses all 15 of our old open issues
> concerning graphs.
> >>>>>>> (I'm sure it will have its own issues, though.)
> >>>>>>>
> >>>>>>> The basic idea is to use trig syntax, and to support the
> different
> >>>>>>> desired relationships between labels and their graphs via
> class
> >>>>>>> information on the labels. In particular, according to this
> proposal,
> >>>>>>> in this trig document:
> >>>>>>>
> >>>>>>> <u1> {<a> <b> <c> }
> >>>>>>>
> >>>>>>> ... we only know that<u1> is some kind of label for the RDF
> Graph<a>
> >>>>>>> <b> <c>, like today. However, in his trig document:
> >>>>>>>
> >>>>>>> {<u2> a rdf:Graph }
> >>>>>>> <u2> {<a> <b> <c> }
> >>>>>>>
> >>>>>>> we know that<u2> is an rdf:Graph and, what's more, we know
> that<u2>
> >>>>>>> actually is the RDF Graph {<a> <b> <c> }. That is,
> in
> >>> this case, we
> >>>>>>> know that URL "u2" is a name we can use in RDF to refer to
> that g-snap.
> >>>>>>>
> >>>>>>> Details are here:
> http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1
> >>>>>>>
> >>>>>>> That page includes answers to all the current GRAPHS issues,
> including
> >>>>>>> ISSUE-5, ISSUE-14, etc.
> >>>>>>>
> >>>>>>> Eric has started going through Why Graphs and adding the
> examples as
> >>>>>>> addressed by Proposal 6.1:
> >>>>>>> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs_6.1
> >>>>>>>
> >>>>>>> -- Sandro (with Eric nearby)
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> > ----
> > Ivan Herman, W3C Semantic Web Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > FOAF: http://www.ivan-herman.net/foaf.rdf
> >
> >
> >
> >
> >
>
>
Received on Thursday, 5 April 2012 11:31:59 UTC