Re: New Proposal (6.1) for GRAPHS from Arnaud Le Hors on 2012-04-04 (public-rdf-wg@w3.org from April 2012)

From: Arnaud Le Hors <lehors@us.ibm.com>
Date: Wed, 4 Apr 2012 15:05:14 -0700
To: public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <OFA9EA8657.854C4A5C-ON882579D6.007220CE-882579D6.007954F7@us.ibm.com>
Hearing people arguing over whether the statement  <u> {<a> <b> <c>} 
defines a complete graph or not leads me to wonder whether we shouldn't 
recognize that the answer ought to be: "it depends". :-)

I think Lee explained very effectively how the same statement can be 
interpreted differently depending on whether you're doing a GET, PUT, POST 
or something similar.

I already noted that the response Sandro expects from his question 
"According to this query, how many triples are in the graph known to that 
endpoint as 'http://g1.example.org' ?" is actually based on additional 
information he is providing in his question, specifically: the query is 
limited to the graph known to that particular endpoint.

If all you had were the following triples:
>>>      <a>  <b>  1.
>>>      <a>  <b>  2.
>>>      <a>  <b>  3.
without giving any other information about how or where you got them from 
and you'd ask: "how many triples are associated with <a>?"  I think the 
answer would have to be: "it depends".

I heard Sandro say that when he dereferences <u> he expects to get all the 
triples in <u>. I agree but I think that's a Linked Data view of the world 
and it comes from the meaning of GET rather than what you receive. In 
another context, retrieved in a different way, what you receive might mean 
something else. In the case of a SPARQL query it could mean "these are all 
the triples in <u> that this endpoint knows about".

So, do we really need to choose one way or the other? Can't we just leave 
it to the application to decide whether it defines a complete graph or 
not?
--
Arnaud  Le Hors - Software Standards Architect - IBM Software Group




From:   Lee Feigenbaum <lee@thefigtrees.net>
To:     Ivan Herman <ivan@w3.org>, 
Cc:     Arnaud Le Hors/Cupertino/IBM@IBMUS, Sandro Hawke <sandro@w3.org>, 
public-rdf-wg <public-rdf-wg@w3.org>
Date:   04/04/2012 05:16 AM
Subject:        Re: New Proposal (6.1) for GRAPHS
Sent by:        Lee Feigenbaum <figtree@gmail.com>



How do people use TriG in practice today? For us, the choice between 
these semantics today is determined externally to a TriG file. That is, 
given foo.trig which contains u1 { a b c }, whether this is all of u1 or 
a subgraph of u1 is determined based on which API or which command-line 
command is used. For example:

 > anzo import foo.trig

...interprets what's in the the trig file as subgraphs that get added to 
any existing contents of the graphs. So "a b c" would be added to u1.

 > anzo replace foo.trig

...interprets what's in the trig file as complete graphs, and sets the 
contents of the graphs in the repository, overwriting whatever might 
already be in the repository as the contents of the graphs. So after 
this operation u1 ends up with exactly { a b c } as its contents.

(Aside: So for us, "import" is basically like doing a POST of the 
triples in the trig file to the associated graphs via the SPARQL Graph 
Store Protocol, and "replace" is like doing a PUT.)

(Aside 2: there are other operations as well, such as "anzo update 
--remove" which uses the subgraph semantics and also means that the 
triples in question should be removed from the associated graphs in the 
repository.)

All of which is to say, there are plenty of use cases in our experience 
for both of these semantics. If the standard supported a way to make 
these semantics explicit, we would probably support that via some sort 
of generic command ("anzo process"? who knows), but would still let 
these existing command line commands override the semantics. We have 
plenty of cases in which we export some bit of trig, and then later on 
either use "anzo import" or "anzo replace" based on the situation -- and 
we wouldn't want to have to produce two different trig files for that 
situation! (This would be roughly analogous to the way in which the 
SPARQL Protocol lets the RDF dataset definition override what's in the 
query, so that queries can easily be reused in different contexts.)

Lee

On 4/4/2012 3:18 AM, Ivan Herman wrote:
>
> On Apr 4, 2012, at 07:09 , Arnaud Le Hors wrote:
>
>> Hi Sandro,
>> I have to say that my expectation was similar to Charles's. I guess 
it's a matter of deciding whether<u1>  {<a>  <b>  <c>   } defines the<u1> 
graph in its entirety, as containing one triple, or merely states that the 
triple<a>  <b>  <c>   is part of graph<u1>.
>>
>> I'm not saying it should be the latter rather than the former, just 
that it's not obvious.
>> See below for more on that.
>
> So let me give my typical W3C answer, ie, trying to find a compromise:-)
>
> More seriously. The structure offered by Sandro relies on the fact that 
the
>
> <u>  {<a>  <b>  <c>  }
>
> syntax gets its more precise meaning through a possible
>
> <u>  rdf:type rdf:SOMECLASSHERE .
>
> Sandro offered two such classes; isn't possible to have three, one that 
makes the graph THE graph, the other that makes it PART OF the graph?
>
> We can of course have long discussions on which the default is. But that 
is a lighter discussion I believe.
>
> Ivan
>
>
>
>>
>> Sandro Hawke<sandro@w3.org>  wrote on 04/02/2012 05:57:13 PM:
>>
>>> From: Sandro Hawke<sandro@w3.org>
>>> To: Charles Greer<cgreer@marklogic.com>,
>>> Cc: Charles Greer<Charles.Greer@marklogic.com>, public-rdf-wg
>>> <public-rdf-wg@w3.org>
>>> Date: 04/02/2012 05:57 PM
>>> Subject: Re: New Proposal (6.1) for GRAPHS
>>>
>>> On Mon, 2012-04-02 at 14:00 -0700, Charles Greer wrote:
>>>> Thanks for responding Sandro.  I think that what I'm finding 
difficult,
>>>> or at least a significant departure from RDF as I have understood it 
in
>>>> the past, is that this TRIG document
>>>>
>>>> <u1>  {<a>  <b>  <c>  .<d>  <e>  <f>  }
>>>>
>>>> is not equivalent to these n-quads:
>>>>
>>>> <a>  <b>  <c>  <u1>.
>>>> <d>  <e>  <f>  <u1>.
>>>>
>>>> Or rather, you now need a document structure around n-quads as well 
in
>>>> order to provide the context in which rdf knows that these triples, 
and
>>>> only these triples, constitute the graph<u1>.
>>>>
>>>> I had previously thought that RDF was a data model that didn't need 
any
>>>> notion of 'document' to work.  I'm not sure how another assertion 
that
>>>>
>>>> {<u1>  a rdf:Graph }
>>>>
>>>> can assert the boundaries of<u1>  unless either the { } syntax does 
more
>>>> than it appears to, or the document is a harder scope boundary than I
>>>> would have expected.  If the document has some relationship to scope, 
I
>>>> think that should be made explicit.
>>>
>>> Two main points:
>>>
>>> 1.  That rdf:Graph declaration is different thing.  It changes how<u1>
>>> relates to the graph, but in a semantic (not syntactic) way.  It can 
be
>>> in a different document, or deduced by the use of some predicates, or
>>> known a priori by a data consumer.  Knowing it entitles the consumer 
to
>>> see that<u1>  actually identifies the graph directly, rather than just
>>> being a label for the graph.     This might matter if we also know<u1>
>>> dc:licence ...SomeLicensingTerms....   Is it the graph that's 
licensed,
>>> or something else?     There are some use cases that suggests this
>>> distinction is important, but if it turns out not to be, it's not bad,
>>> people will just not use rdf:Graph declarations much.
>>>
>>> 2.  Whether or not your trig example and your n-quads example are
>>> equivalent depends on your reading of n-quads.   This extends to your
>>> reading of SPARQL as well.     My understanding is people are somewhat
>>> informal about this, but they generally do expect that once they've 
seen
>>> the whole trig file, or the whole n-quads file, or searched the whole
>>> SPARQL end point, that they've seen all the triples in the graph with
>>> that name/label.
>>>
>>> As a social test case, we could tell people this SPARQL query is run:
>>>
>>>      SELECT ?s ?p ?o
>>>      WHERE GRAPH<http://g1.example.org>  { ?s ?p ?o }.
>>>
>>> and that we got three result bindings back:
>>>
>>>      ?s  ?p  ?o
>>>      === === ===
>>>      <a>  <b>  1.
>>>      <a>  <b>  2.
>>>      <a>  <b>  3.
>>>
>>> Then we ask them: "According to this query, how many triples are in 
the
>>> graph known to that endpoint as 'http://g1.example.org' ?"
>>>
>>> What do you think they'll say?
>>>
>>> I think most folks will say, "Three", even if you ask them to think
>>> again and be pedantically precise.
>>>
>>
>> I agree that's what they would say but primarily because you said: "in 
the graph known to that endpoint"
>> This is a critical element which isn't apparent in a mere statement 
like:
>>
>> <u1>  {<a>  <b>  <c>  .<d>  <e>  <f>  }
>>
>> Which doesn't say anything about where it comes from and whether it's 
complete or not.
>>
>> This being said, I can get used to having it the way you suggest. 
Especially when the graph name comes first. If we had: {<a>  <b>  <c> .<d> 
 <e>  <f>  }<u1>  I would think differently.
>> --
>> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
>>
>>
>>> I think that means they're using the complete-graph semantics I'm
>>> suggesting.  If they were using partial-graph semantics, they'd have 
to
>>> say, "Three or more".
>>>
>>> You see what I'm saying?   When we have a complete protocol 
interaction,
>>> via SPARQL, or transmitting a trig or n-quad files, I think the usual
>>> assumption is that *all* the triples in the named graph are being 
sent,
>>> not just some of them.
>>>
>>> I understand sometimes it would be nice to store/transmit just part of
>>> some named graph.   But, as I discussed in a message a couple of 
minutes
>>> ago, I think we have to pick one or the other, and I think the
>>> complete-graph approach is better.  It's pretty easy to convey partial
>>> graphs if we define the complete approach.
>>>
>>> (I suppose if we defined the partial-graph approach we could transmit
>>> complete graphs by transmitting partial graphs and including a
>>> triple-count as metadata, so you know it's complete.   I guess that
>>> would work, but it seems to me to be optimizing for the 
much-less-common
>>> case.)
>>>
>>> Coming back to:
>>>
>>>> I had previously thought that RDF was a data model that didn't need
>>> any
>>>> notion of 'document' to work.
>>>
>>> Yeah, it depends what you're doing with it.   There's a lot you can do
>>> with RDF without paying any attention to what documents particular 
bits
>>> of RDF were found in, but I think most of the Graphs use cases involve
>>> situations where you do need to pay attention to these document
>>> boundaries.
>>>
>>>> Thanks for your willingness to understand my points --- I'm sure that 
my
>>>> formal language will improve over time.
>>>
>>> It's a long process.   :-)    Interesting, it seems to be helped by
>>> arguing.
>>>
>>>      -- Sandro
>>>
>>>>
>>>> Charles
>>>>
>>>>
>>>>
>>>> On 04/02/2012 08:36 AM, Sandro Hawke wrote:
>>>>> On Thu, 2012-03-29 at 09:25 -0700, Charles Greer wrote:
>>>>>> I really like this solution and it seems to satisfy the use cases
>>>>>> familiar to me from when I actually worked a lot with RDF in the 
wild.
>>>>>>
>>>>>> One thing I'm tripping over though --  The scope of a TRIG document 
or
>>>>>> RDF dataset in effect 'closes the world.'  Is the idea of "merge" 
only
>>>>>> within a TRIG document/dataset?
>>>>>>
>>>>>> I can only see two ways to really assert a graph literal -- either 
by
>>>>>> sanctifying the boundaries of  a dataset, thereby making merges 
with
>>>>>> external data problematic, or by signing bytes.  Am I missing 
something,
>>>>>> as usual?
>>>>> There's some misunderstanding here, yes.   Maybe you can talk 
through
>>>>> some particular thing you imagine doing, involving merging and TriG, 
and
>>>>> I'll be able to pick it up.   From what you've written, I'm 
confused.
>>>>>
>>>>> Maybe I can clarifying by translating this TriG document:
>>>>>
>>>>>           <u1>    {<a>    <b>    <c>   }
>>>>>
>>>>> into this English declaration:
>>>>>
>>>>>           The URI 'u1' denotes something, and that thing has exactly 
one
>>>>>           associated RDF Graph.   That associated RDF graph consists 
of
>>>>>           one RDF triple, which we can write in turtle as "<a>   <b> 
  <c>".
>>>>>
>>>>> So, perhaps it's more clear, now.  If you merged that with another 
TriG
>>>>> document:
>>>>>
>>>>>           <u1>    {<a>    <b>    <d>   }
>>>>>
>>>>> Then, trying to accept both documents at onces, you'd be saying:
>>>>>
>>>>>           The URI 'u1' denotes something, and that thing has exactly 
one
>>>>>           associated RDF graph.  In one document that associated 
graph is
>>>>>           claimed to be the RDF triple "<a>   <b>   <c>", but in 
another
>>>>>           document that graph is claimed to be the RDF triple "<a> 
<b>
>>>>>           <d>".
>>>>>
>>>>> So, in this case, you can try to merge the documents, but when you 
do,
>>>>> you find there is a contradiction, since there is only allowed to be 
one
>>>>> associated graph, but in this case there are two different ones.
>>>>>
>>>>>          -- Sandro
>>>>>
>>>>>> Charles
>>>>>>
>>>>>>
>>>>>> On 03/27/2012 07:23 PM, Sandro Hawke wrote:
>>>>>>> I've written up design 6 (originally suggested by Andy) in more
>>>>>>> detail.  I've called in 6.1 since I've change/added a few details 
that
>>>>>>> Andy might not agree with.  Eric has started writing up how the 
use
>>>>>>> cases are addressed by this proposal.
>>>>>>>
>>>>>>> This proposal addresses all 15 of our old open issues concerning 
graphs.
>>>>>>> (I'm sure it will have its own issues, though.)
>>>>>>>
>>>>>>> The basic idea is to use trig syntax, and to support the different
>>>>>>> desired relationships between labels and their graphs via class
>>>>>>> information on the labels.  In particular, according to this 
proposal,
>>>>>>> in this trig document:
>>>>>>>
>>>>>>>       <u1>    {<a>    <b>    <c>    }
>>>>>>>
>>>>>>> ... we only know that<u1>    is some kind of label for the RDF 
Graph<a>
>>>>>>> <b>    <c>, like today.  However, in his trig document:
>>>>>>>
>>>>>>>       {<u2>    a rdf:Graph }
>>>>>>>       <u2>    {<a>    <b>    <c>    }
>>>>>>>
>>>>>>> we know that<u2>    is an rdf:Graph and, what's more, we know 
that<u2>
>>>>>>> actually is the RDF Graph {<a>    <b>    <c>    }.  That is, in
>>> this case, we
>>>>>>> know that URL "u2" is a name we can use in RDF to refer to that 
g-snap.
>>>>>>>
>>>>>>> Details are here: 
http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1
>>>>>>>
>>>>>>> That page includes answers to all the current GRAPHS issues, 
including
>>>>>>> ISSUE-5, ISSUE-14, etc.
>>>>>>>
>>>>>>> Eric has started going through Why Graphs and adding the examples 
as
>>>>>>> addressed by Proposal 6.1:
>>>>>>> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs_6.1
>>>>>>>
>>>>>>>         -- Sandro (with Eric nearby)
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
Received on Wednesday, 4 April 2012 22:06:47 UTC