Re: New Proposal (6.1) for GRAPHS from Sandro Hawke on 2012-04-05 (public-rdf-wg@w3.org from April 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 05 Apr 2012 07:31:48 -0400
To: Arnaud Le Hors <lehors@us.ibm.com>
Cc: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <1333625508.24423.27.camel@waldron>
On Wed, 2012-04-04 at 15:05 -0700, Arnaud Le Hors wrote:
> Hearing people arguing over whether the statement  <u> {<a> <b> <c>}
> defines a complete graph or not leads me to wonder whether we
> shouldn't recognize that the answer ought to be: "it depends". :-) 
> 
> I think Lee explained very effectively how the same statement can be
> interpreted differently depending on whether you're doing a GET, PUT,
> POST or something similar. 
> 
> I already noted that the response Sandro expects from his question
> "According to this query, how many triples are in the graph known to
> that endpoint as ' ?" is actually based on additional information he
> is providing in his question, specifically: the query is limited to
> the graph known to that particular endpoint. 
> 
> If all you had were the following triples: 
> >>>      <a>  <b>  1.
> >>>      <a>  <b>  2.
> >>>      <a>  <b>  3.
> without giving any other information about how or where you got them
> from and you'd ask: "how many triples are associated with <a>?"  I
> think the answer would have to be: "it depends". 

FWIW, I really think you do need to keep the notion of a graph in the
question, since SPARQL has the keyword "GRAPH".   As in, "How many
triples are in the graph associated with <http://g1.example.org>.

But since we're not really doing this survey anyway, it probably doesn't
matter.

> I heard Sandro say that when he dereferences <u> he expects to get all
> the triples in <u>. I agree but I think that's a Linked Data view of
> the world and it comes from the meaning of GET rather than what you
> receive. In another context, retrieved in a different way, what you
> receive might mean something else. In the case of a SPARQL query it
> could mean "these are all the triples in <u> that this endpoint knows
> about". 
> 
> So, do we really need to choose one way or the other? Can't we just
> leave it to the application to decide whether it defines a complete
> graph or not?

I see three ways to do this.   Which are you suggesting?

  1.  have two syntaxes.   eg trig means complete graphs and n-quads 
      means partial graphs.

  2.  have two constructs in trig, eg: 
        <u1> { <a> <b> <c> }         to mean full graphs and
        <u2> < <a> <b> <c> ... }     to means partial graphs

  3.  use the same syntax, but let the consumer decide which was meant.

I don't like 3 at all, because it doesn't solve the use cases.   For
isntance, you couldn't have a shared crawler, if the apps using the
crawl happened to need complete graphs.

The second one is cute, but I think would be very hard to implement; it
would force every consumer to deal with both complete and incomplete
graphs, at least a bit.

Both 1 and 2 raise the issue of how you reflect this difference in the
dataset, or in SPARQL.   How could you do that?

    -- Sandro


> --
> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
> 
> 
> 
> 
> From:        Lee Feigenbaum <lee@thefigtrees.net> 
> To:        Ivan Herman <ivan@w3.org>, 
> Cc:        Arnaud Le Hors/Cupertino/IBM@IBMUS, Sandro Hawke
> <sandro@w3.org>, public-rdf-wg <public-rdf-wg@w3.org> 
> Date:        04/04/2012 05:16 AM 
> Subject:        Re: New Proposal (6.1) for GRAPHS 
> Sent by:        Lee Feigenbaum <figtree@gmail.com> 
> 
> ______________________________________________________________________
> 
> 
> 
> How do people use TriG in practice today? For us, the choice between 
> these semantics today is determined externally to a TriG file. That
> is, 
> given foo.trig which contains u1 { a b c }, whether this is all of u1
> or 
> a subgraph of u1 is determined based on which API or which
> command-line 
> command is used. For example:
> 
> > anzo import foo.trig
> 
> ...interprets what's in the the trig file as subgraphs that get added
> to 
> any existing contents of the graphs. So "a b c" would be added to u1.
> 
> > anzo replace foo.trig
> 
> ...interprets what's in the trig file as complete graphs, and sets
> the 
> contents of the graphs in the repository, overwriting whatever might 
> already be in the repository as the contents of the graphs. So after 
> this operation u1 ends up with exactly { a b c } as its contents.
> 
> (Aside: So for us, "import" is basically like doing a POST of the 
> triples in the trig file to the associated graphs via the SPARQL
> Graph 
> Store Protocol, and "replace" is like doing a PUT.)
> 
> (Aside 2: there are other operations as well, such as "anzo update 
> --remove" which uses the subgraph semantics and also means that the 
> triples in question should be removed from the associated graphs in
> the 
> repository.)
> 
> All of which is to say, there are plenty of use cases in our
> experience 
> for both of these semantics. If the standard supported a way to make 
> these semantics explicit, we would probably support that via some
> sort 
> of generic command ("anzo process"? who knows), but would still let 
> these existing command line commands override the semantics. We have 
> plenty of cases in which we export some bit of trig, and then later
> on 
> either use "anzo import" or "anzo replace" based on the situation --
> and 
> we wouldn't want to have to produce two different trig files for that 
> situation! (This would be roughly analogous to the way in which the 
> SPARQL Protocol lets the RDF dataset definition override what's in
> the 
> query, so that queries can easily be reused in different contexts.)
> 
> Lee
> 
> On 4/4/2012 3:18 AM, Ivan Herman wrote:
> >
> > On Apr 4, 2012, at 07:09 , Arnaud Le Hors wrote:
> >
> >> Hi Sandro,
> >> I have to say that my expectation was similar to Charles's. I guess
> it's a matter of deciding whether<u1>  {<a>  <b>  <c>   } defines
> the<u1>  graph in its entirety, as containing one triple, or merely
> states that the triple<a>  <b>  <c>   is part of graph<u1>.
> >>
> >> I'm not saying it should be the latter rather than the former, just
> that it's not obvious.
> >> See below for more on that.
> >
> > So let me give my typical W3C answer, ie, trying to find a
> compromise:-)
> >
> > More seriously. The structure offered by Sandro relies on the fact
> that the
> >
> > <u>  {<a>  <b>  <c>  }
> >
> > syntax gets its more precise meaning through a possible
> >
> > <u>  rdf:type rdf:SOMECLASSHERE .
> >
> > Sandro offered two such classes; isn't possible to have three, one
> that makes the graph THE graph, the other that makes it PART OF the
> graph?
> >
> > We can of course have long discussions on which the default is. But
> that is a lighter discussion I believe.
> >
> > Ivan
> >
> >
> >
> >>
> >> Sandro Hawke<sandro@w3.org>  wrote on 04/02/2012 05:57:13 PM:
> >>
> >>> From: Sandro Hawke<sandro@w3.org>
> >>> To: Charles Greer<cgreer@marklogic.com>,
> >>> Cc: Charles Greer<Charles.Greer@marklogic.com>, public-rdf-wg
> >>> <public-rdf-wg@w3.org>
> >>> Date: 04/02/2012 05:57 PM
> >>> Subject: Re: New Proposal (6.1) for GRAPHS
> >>>
> >>> On Mon, 2012-04-02 at 14:00 -0700, Charles Greer wrote:
> >>>> Thanks for responding Sandro.  I think that what I'm finding
> difficult,
> >>>> or at least a significant departure from RDF as I have understood
> it in
> >>>> the past, is that this TRIG document
> >>>>
> >>>> <u1>  {<a>  <b>  <c>  .<d>  <e>  <f>  }
> >>>>
> >>>> is not equivalent to these n-quads:
> >>>>
> >>>> <a>  <b>  <c>  <u1>.
> >>>> <d>  <e>  <f>  <u1>.
> >>>>
> >>>> Or rather, you now need a document structure around n-quads as
> well in
> >>>> order to provide the context in which rdf knows that these
> triples, and
> >>>> only these triples, constitute the graph<u1>.
> >>>>
> >>>> I had previously thought that RDF was a data model that didn't
> need any
> >>>> notion of 'document' to work.  I'm not sure how another assertion
> that
> >>>>
> >>>> {<u1>  a rdf:Graph }
> >>>>
> >>>> can assert the boundaries of<u1>  unless either the { } syntax
> does more
> >>>> than it appears to, or the document is a harder scope boundary
> than I
> >>>> would have expected.  If the document has some relationship to
> scope, I
> >>>> think that should be made explicit.
> >>>
> >>> Two main points:
> >>>
> >>> 1.  That rdf:Graph declaration is different thing.  It changes
> how<u1>
> >>> relates to the graph, but in a semantic (not syntactic) way.  It
> can be
> >>> in a different document, or deduced by the use of some predicates,
> or
> >>> known a priori by a data consumer.  Knowing it entitles the
> consumer to
> >>> see that<u1>  actually identifies the graph directly, rather than
> just
> >>> being a label for the graph.     This might matter if we also
> know<u1>
> >>> dc:licence ...SomeLicensingTerms....   Is it the graph that's
> licensed,
> >>> or something else?     There are some use cases that suggests this
> >>> distinction is important, but if it turns out not to be, it's not
> bad,
> >>> people will just not use rdf:Graph declarations much.
> >>>
> >>> 2.  Whether or not your trig example and your n-quads example are
> >>> equivalent depends on your reading of n-quads.   This extends to
> your
> >>> reading of SPARQL as well.     My understanding is people are
> somewhat
> >>> informal about this, but they generally do expect that once
> they've seen
> >>> the whole trig file, or the whole n-quads file, or searched the
> whole
> >>> SPARQL end point, that they've seen all the triples in the graph
> with
> >>> that name/label.
> >>>
> >>> As a social test case, we could tell people this SPARQL query is
> run:
> >>>
> >>>      SELECT ?s ?p ?o
> >>>      WHERE GRAPH<http://g1.example.org>  { ?s ?p ?o }.
> >>>
> >>> and that we got three result bindings back:
> >>>
> >>>      ?s  ?p  ?o
> >>>      === === ===
> >>>      <a>  <b>  1.
> >>>      <a>  <b>  2.
> >>>      <a>  <b>  3.
> >>>
> >>> Then we ask them: "According to this query, how many triples are
> in the
> >>> graph known to that endpoint as 'http://g1.example.org' ?"
> >>>
> >>> What do you think they'll say?
> >>>
> >>> I think most folks will say, "Three", even if you ask them to
> think
> >>> again and be pedantically precise.
> >>>
> >>
> >> I agree that's what they would say but primarily because you said:
> "in the graph known to that endpoint"
> >> This is a critical element which isn't apparent in a mere statement
> like:
> >>
> >> <u1>  {<a>  <b>  <c>  .<d>  <e>  <f>  }
> >>
> >> Which doesn't say anything about where it comes from and whether
> it's complete or not.
> >>
> >> This being said, I can get used to having it the way you suggest.
> Especially when the graph name comes first. If we had: {<a>  <b>  <c>
>  .<d>  <e>  <f>  }<u1>  I would think differently.
> >> --
> >> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
> >>
> >>
> >>> I think that means they're using the complete-graph semantics I'm
> >>> suggesting.  If they were using partial-graph semantics, they'd
> have to
> >>> say, "Three or more".
> >>>
> >>> You see what I'm saying?   When we have a complete protocol
> interaction,
> >>> via SPARQL, or transmitting a trig or n-quad files, I think the
> usual
> >>> assumption is that *all* the triples in the named graph are being
> sent,
> >>> not just some of them.
> >>>
> >>> I understand sometimes it would be nice to store/transmit just
> part of
> >>> some named graph.   But, as I discussed in a message a couple of
> minutes
> >>> ago, I think we have to pick one or the other, and I think the
> >>> complete-graph approach is better.  It's pretty easy to convey
> partial
> >>> graphs if we define the complete approach.
> >>>
> >>> (I suppose if we defined the partial-graph approach we could
> transmit
> >>> complete graphs by transmitting partial graphs and including a
> >>> triple-count as metadata, so you know it's complete.   I guess
> that
> >>> would work, but it seems to me to be optimizing for the
> much-less-common
> >>> case.)
> >>>
> >>> Coming back to:
> >>>
> >>>> I had previously thought that RDF was a data model that didn't
> need
> >>> any
> >>>> notion of 'document' to work.
> >>>
> >>> Yeah, it depends what you're doing with it.   There's a lot you
> can do
> >>> with RDF without paying any attention to what documents particular
> bits
> >>> of RDF were found in, but I think most of the Graphs use cases
> involve
> >>> situations where you do need to pay attention to these document
> >>> boundaries.
> >>>
> >>>> Thanks for your willingness to understand my points --- I'm sure
> that my
> >>>> formal language will improve over time.
> >>>
> >>> It's a long process.   :-)    Interesting, it seems to be helped
> by
> >>> arguing.
> >>>
> >>>      -- Sandro
> >>>
> >>>>
> >>>> Charles
> >>>>
> >>>>
> >>>>
> >>>> On 04/02/2012 08:36 AM, Sandro Hawke wrote:
> >>>>> On Thu, 2012-03-29 at 09:25 -0700, Charles Greer wrote:
> >>>>>> I really like this solution and it seems to satisfy the use
> cases
> >>>>>> familiar to me from when I actually worked a lot with RDF in
> the wild.
> >>>>>>
> >>>>>> One thing I'm tripping over though --  The scope of a TRIG
> document or
> >>>>>> RDF dataset in effect 'closes the world.'  Is the idea of
> "merge" only
> >>>>>> within a TRIG document/dataset?
> >>>>>>
> >>>>>> I can only see two ways to really assert a graph literal --
> either by
> >>>>>> sanctifying the boundaries of  a dataset, thereby making merges
> with
> >>>>>> external data problematic, or by signing bytes.  Am I missing
> something,
> >>>>>> as usual?
> >>>>> There's some misunderstanding here, yes.   Maybe you can talk
> through
> >>>>> some particular thing you imagine doing, involving merging and
> TriG, and
> >>>>> I'll be able to pick it up.   From what you've written, I'm
> confused.
> >>>>>
> >>>>> Maybe I can clarifying by translating this TriG document:
> >>>>>
> >>>>>           <u1>    {<a>    <b>    <c>   }
> >>>>>
> >>>>> into this English declaration:
> >>>>>
> >>>>>           The URI 'u1' denotes something, and that thing has
> exactly one
> >>>>>           associated RDF Graph.   That associated RDF graph
> consists of
> >>>>>           one RDF triple, which we can write in turtle as "<a>
> <b>   <c>".
> >>>>>
> >>>>> So, perhaps it's more clear, now.  If you merged that with
> another TriG
> >>>>> document:
> >>>>>
> >>>>>           <u1>    {<a>    <b>    <d>   }
> >>>>>
> >>>>> Then, trying to accept both documents at onces, you'd be saying:
> >>>>>
> >>>>>           The URI 'u1' denotes something, and that thing has
> exactly one
> >>>>>           associated RDF graph.  In one document that associated
> graph is
> >>>>>           claimed to be the RDF triple "<a>   <b>   <c>", but in
> another
> >>>>>           document that graph is claimed to be the RDF triple
> "<a>   <b>
> >>>>>           <d>".
> >>>>>
> >>>>> So, in this case, you can try to merge the documents, but when
> you do,
> >>>>> you find there is a contradiction, since there is only allowed
> to be one
> >>>>> associated graph, but in this case there are two different ones.
> >>>>>
> >>>>>          -- Sandro
> >>>>>
> >>>>>> Charles
> >>>>>>
> >>>>>>
> >>>>>> On 03/27/2012 07:23 PM, Sandro Hawke wrote:
> >>>>>>> I've written up design 6 (originally suggested by Andy) in
> more
> >>>>>>> detail.  I've called in 6.1 since I've change/added a few
> details that
> >>>>>>> Andy might not agree with.  Eric has started writing up how
> the use
> >>>>>>> cases are addressed by this proposal.
> >>>>>>>
> >>>>>>> This proposal addresses all 15 of our old open issues
> concerning graphs.
> >>>>>>> (I'm sure it will have its own issues, though.)
> >>>>>>>
> >>>>>>> The basic idea is to use trig syntax, and to support the
> different
> >>>>>>> desired relationships between labels and their graphs via
> class
> >>>>>>> information on the labels.  In particular, according to this
> proposal,
> >>>>>>> in this trig document:
> >>>>>>>
> >>>>>>>       <u1>    {<a>    <b>    <c>    }
> >>>>>>>
> >>>>>>> ... we only know that<u1>    is some kind of label for the RDF
> Graph<a>
> >>>>>>> <b>    <c>, like today.  However, in his trig document:
> >>>>>>>
> >>>>>>>       {<u2>    a rdf:Graph }
> >>>>>>>       <u2>    {<a>    <b>    <c>    }
> >>>>>>>
> >>>>>>> we know that<u2>    is an rdf:Graph and, what's more, we know
> that<u2>
> >>>>>>> actually is the RDF Graph {<a>    <b>    <c>    }.  That is,
> in
> >>> this case, we
> >>>>>>> know that URL "u2" is a name we can use in RDF to refer to
> that g-snap.
> >>>>>>>
> >>>>>>> Details are here:
> http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1
> >>>>>>>
> >>>>>>> That page includes answers to all the current GRAPHS issues,
> including
> >>>>>>> ISSUE-5, ISSUE-14, etc.
> >>>>>>>
> >>>>>>> Eric has started going through Why Graphs and adding the
> examples as
> >>>>>>> addressed by Proposal 6.1:
> >>>>>>> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs_6.1
> >>>>>>>
> >>>>>>>         -- Sandro (with Eric nearby)
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> > ----
> > Ivan Herman, W3C Semantic Web Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > FOAF: http://www.ivan-herman.net/foaf.rdf
> >
> >
> >
> >
> >
> 
>
Received on Thursday, 5 April 2012 11:31:59 UTC