Re: RSP Data Model

On 6/15/15 10:28 AM, Daniele Dell'Aglio wrote:
> Hi Tara,
>
> Regarding this:
> > Elsewhere in the document, "triple" appears to be used as meaning 
> "RDF triple". But in this case "triple" must mean
> > simply a tuple of size three, because RDF does not allow a graph to 
> be the subject of an RDF triple.
>
> Given that a graph is identified with a URI, it is possible to 
> predicate on it, as in [1], example 10 (lines 27--29). It follows that 
> (g, p, t) can be a RDF statement... do you see any problem with that?
>
> Daniele
>
> [1] http://www.w3.org/TR/rdf11-primer/#section-trig
Hi Daniele

First, I question this assumption: "Given that a graph is identified 
with a URI". I don't think every time-stamped graph should be required 
to have the graph identified with an IRI, as in some cases, a blank node 
is sufficient. Requiring an additional IRI when it is not necessary puts 
a burden on fast transmission.

Second, you are using 'g' as if it is an IRI but the semantics document 
says that 'g' *is* a graph. Let's be precise. To make an RDF triple, the 
subject must be an IRI or a blank node. It is possible to construct a 
triple (n, p, t) where "n" is an IRI or blank node that is the name of a 
named RDF graph pair (n, g). Such an RDF triple is not by itself an RDF 
Dataset, because it is also required to associate the name with the 
graph. However, this RDF triple could be the complete contents of the 
default graph of an RDF Dataset d, where d also contains one named graph 
(n, g).

It is mathematically sound to talk about a tuple (g, p, t) where g is 
actually a graph, rather than the name of a graph. This structure can be 
in correspondence to an RDF Dataset, as above. But this tuple cannot 
itself be called an RDF Triple.

Now if the intention is really that (g, p, t) is to be an RDF Triple, 
then g *denotes* a graph, rather than *being* a graph -- or it could 
denote a name-graph pair, depending on which variant of the semantics 
http://www.w3.org/TR/rdf11-datasets/#each-named-graph-defines-its-own-context 
is adopted. It is conventional to use "n" as the variable in that case 
rather than "g". Then the question arises: is a time-stamped graph 
really just this triple? Or is it an RDF Dataset where this triple is in 
the default graph? In the former case, then the IRI must be dereferenced 
or otherwise resolved to obtain the graph for querying. In the latter 
case, the named graph must be included in the stream in order to fully 
define the RDF Dataset.

Finally, the RDF Primer is not intended to ever be normative. As such, 
it can contain statements that are not entirely precise, because it is 
about introducing newcomers to the ideas, not defining them. The TRIG 
specification (http://www.w3.org/TR/2014/REC-trig-20140225/) is a 
normative specification that defines a concrete syntax for RDF Datasets, 
and (rightly) says nothing about the semantics of RDF Datasets. But when 
we are defining the abstract model for a time-stamped graph, we should 
not be referring to any particular concrete syntax.  The RDF Dataset 
semantics is not yet an accepted recommendation, but it is the document 
most likely to progress to a recommendation, and I think it is the best 
choice for the foundation of time-stamped graph semantics.

This whole discussion is relevant to the issue 
https://github.com/streamreasoning/RSP-QL/issues/10, but perhaps 
qualifies for an issue of its own.

Tara
>
>
>
> Il giorno lun 15 giu 2015 alle ore 15:16 Tara Athan 
> <taraathan@gmail.com <mailto:taraathan@gmail.com>> ha scritto:
>
>     On 6/15/15 5:30 AM, Gray, Alasdair J G wrote:
>>     Hi Tara,
>>
>>     On 14 June 2015 at 12:00:09, Tara Athan (taraathan@gmail.com
>>     <mailto:taraathan@gmail.com>) wrote:
>>
>>>     Dear Abraham, and all -
>>>     Please excuse me if this point has already been discussed in the
>>>     group, as I am late joining the discussion.
>>
>>     Welcome to the discussion, the more the merrier.
>>
>>>
>>>     It seems to me that there is an existing basis on which to build
>>>     such a data model - the RDF 1.1 dataset. The semantics for a set
>>>     of time-stamped graphs (g_i, p_i, t_i) that seems most
>>>     appropriate to me is the one defined here:
>>>     http://www.w3.org/TR/rdf11-datasets/#each-named-graph-defines-its-own-context
>>>     and the name of each graph would be an implicit blank node that
>>>     is also the subject of a triple in the default graph. This
>>>     triple has predicate p_i and object t_i .
>>
>>     The discussion of the streaming graph data model came up at our
>>     recent face-to-face meeting which is where we came up with the
>>     current data model described in
>>
>>     https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md
>>
>>     As you will see in that document, we have exactly the semantics
>>     you are suggesting.
>>
>     I see this now - in the section "Timestamped Graph" - it is
>     somewhat hidden as RDF Datasets are not explicitly mentioned here.
>     There is one point especially about this definition that I find
>     confusing:  '(g, p, t)' is called a triple.
>
>     Elsewhere in the document, "triple" appears to be used as meaning
>     "RDF triple". But in this case "triple" must mean simply a tuple
>     of size three, because RDF does not allow a graph to be the
>     subject of an RDF triple.
>
>     I have put this as well as a few other clarifications into a pull
>     request (https://github.com/streamreasoning/RSP-QL/pull/12) for
>     purpose of discussion.  I noticed that the pull request is already
>     merged - I probably should have made clear in my pull request
>     comment that this was requested for discussion, rather than
>     immediate merge.
>
>     Best regards, Tara
>
>>     Alasdair
>>
>>>
>>>
>>>     Tara
>>>
>>>     On 6/14/15 3:59 AM, Abraham Bernstein wrote:
>>>>     Dear Emanuele, dear all
>>>>
>>>>     I wonder whether we are mixing two issues here. One is the data
>>>>     model of time-annotated graphs. The other is a system model
>>>>     that, as you indicate, is much easier to deine if you can make
>>>>     some assumptions about how the triples (or graph fragments)
>>>>     arrive (in order, monotonically increasing, etc.).
>>>>
>>>>     I would propose to disentangle the two. In other words, I would
>>>>     propose a well-founded time-based data model combined with a
>>>>     set of assertions that we expect to hold on streams.
>>>>
>>>>     Best
>>>>
>>>>     Avi
>>>>
>>>>
>>>>
>>>>>     On 12.06.2015, at 18:16, Emanuele Della Valle
>>>>>     <emanuele.dellavalle@polimi.it
>>>>>     <mailto:emanuele.dellavalle@polimi.it>> wrote:
>>>>>
>>>>>     Dear Alasdair,
>>>>>
>>>>>     a problem I run into went I implemented the timestamped model
>>>>>     in real use cases is that you need to wait for all
>>>>>     contemporaneous triples with the same timestamp, before
>>>>>     processing them. They arrive to the RSP engine one after each
>>>>>     other, so the arrival time is always increasing, but they all
>>>>>     carry the some timestamp. If you assume that timestamp are not
>>>>>     decreasing, an RSP engine knows it can start the processing as
>>>>>     soon as a triple with a larger timestamp arrives, but what if
>>>>>     the stream stay silent? How does the RSP engine distinguish
>>>>>     the case of a delayed triple (still contemporaneous to those
>>>>>     it has already got) from the case it is waiting because
>>>>>     nothing is transmitted on the stream? In the C-SPARQL engine
>>>>>     we decided to give up with the possibility to treat the
>>>>>     application time and we only relay on the receiving time. This
>>>>>     is also what STREAM does. It is know as the best effort
>>>>>     approach. Esper can work in best effort mode, but you can also
>>>>>     send an event to say the time is past. This is call external
>>>>>     time control. This time keeping event is a form of
>>>>>     punctuation. It means, I told you all I have to say at this
>>>>>     point in time.
>>>>>
>>>>>     If graphs are timestamped with a strictly increasing
>>>>>     timestamp, then as soon as the RSP engine gets the entire
>>>>>     graph, it can process it. In other words, the boundary of the
>>>>>     graph is a form of punctuation. If another graph with the same
>>>>>     timestamp can follow, than you’re back into the problem you
>>>>>     cannot distinguish if you are waiting for a delayed graph with
>>>>>     the same timestamp from the case the stream is silent.
>>>>>
>>>>>     I hope I expressed myself in a clearer way this time.
>>>>>
>>>>>     Best Regards,
>>>>>
>>>>>     Emanuele
>>>>>
>>>>>     PS I’m in favour of multiple time annotations and I agree that
>>>>>     interval-based semantics matters.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>     On 12 Jun 2015, at 18:31, Gray, Alasdair J G
>>>>>>     <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>>>>>>
>>>>>>     Dear Emanuele,
>>>>>>
>>>>>>     I don’t quite follow the punctuation argument meaning that we
>>>>>>     can only have one graph at any given time point.
>>>>>>     (Unfortunately I’m on the train home and cannot access the
>>>>>>     article that you linked.)
>>>>>>
>>>>>>     We still have the gain over the traditional streaming RDF
>>>>>>     model in that all triples conforming to a given observation
>>>>>>     will be contained in the graph. So why does having more than
>>>>>>     one graph at a given time point cause a problem?
>>>>>>     (Sorry if I am missing something obvious)
>>>>>>
>>>>>>     Best regards,
>>>>>>
>>>>>>     Alasdair
>>>>>>
>>>>>>     On 12 June 2015 at 08:49:40, Emanuele Della Valle
>>>>>>     (emanuele.dellavalle@polimi.it
>>>>>>     <mailto:emanuele.dellavalle@polimi.it>) wrote:
>>>>>>
>>>>>>>     Dear Alasdair, and all
>>>>>>>
>>>>>>>     thanks for the report. I would like to point out that the
>>>>>>>     sentence “There can be multiple graphs with the same
>>>>>>>     timestamp” is, in my opinion, a bad choice. It will prevent
>>>>>>>     graphs to be interpreted as a form of punctuation [1] and
>>>>>>>     this was one of the most important gain of the version of
>>>>>>>     RSP Data Model discussed in Berlin (i.e., graphs with
>>>>>>>     strictly increasing timestamps). The lack of punctuation is
>>>>>>>     a problem of the “traditional" timestamped triples data
>>>>>>>     model where contemporary triples must be admitted.
>>>>>>>
>>>>>>>     Best regards,
>>>>>>>
>>>>>>>     Emanuele
>>>>>>>
>>>>>>>     [1]
>>>>>>>     http://link.springer.com/referenceworkentry/10.1007%2F978-0-387-39940-9_285
>>>>>>>
>>>>>>>
>>>>>>>>     On 11 Jun 2015, at 18:37, Gray, Alasdair J G
>>>>>>>>     <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>>>>>>>>
>>>>>>>>     Hi All,
>>>>>>>>
>>>>>>>>     During the ESWC RSP Workshop we had a breakout group focus
>>>>>>>>     on defining the RSP data model. I was charged with the
>>>>>>>>     action of updating the semantics document with the agreed
>>>>>>>>     model.
>>>>>>>>
>>>>>>>>     You can find the updated data model at
>>>>>>>>     https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md
>>>>>>>>
>>>>>>>>     Best regards,
>>>>>>>>
>>>>>>>>     Alasdair
>>>>>>>>
>>>>>>>>     -- 
>>>>>>>>     Alasdair J G Gray
>>>>>>>>     Lecturer, Heriot-Watt University
>>>>>>>>     Web: http://www.alasdairjggray.co.uk
>>>>>>>>     <http://www.alasdairjggray.co.uk/>
>>>>>>>>     ORCID: http://orcid.org/0000-0002-5711-4872
>>>>>>>>     Twitter: @gray_alasdair
>>>>>>>>     Telephone: +44 131 451 3429
>>>>>>>>     Office: EM 1.39
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     We invite research leaders and ambitious early career
>>>>>>>>     researchers to join us in leading and driving research in
>>>>>>>>     key inter-disciplinary themes. Please
>>>>>>>>     seewww.hw.ac.uk/researchleaders
>>>>>>>>     <http://www.hw.ac.uk/researchleaders>for further
>>>>>>>>     information and how to apply.
>>>>>>>>
>>>>>>>>     Heriot-Watt University is a Scottish charity registered
>>>>>>>>     under charity number SC000278.
>>>>>>>
>>>>>>     -- 
>>>>>>     Alasdair J G Gray
>>>>>>     Lecturer, Heriot-Watt University
>>>>>>     Web: http://www.alasdairjggray.co.uk
>>>>>>     <http://www.alasdairjggray.co.uk/>
>>>>>>     ORCID: http://orcid.org/0000-0002-5711-4872
>>>>>>     Twitter: @gray_alasdair
>>>>>>     Telephone: +44 131 451 3429
>>>>>>     Office: EM 1.39
>>>>>>
>>>>>>
>>>>>>
>>>>>>     We invite research leaders and ambitious early career
>>>>>>     researchers to join us in leading and driving research in key
>>>>>>     inter-disciplinary themes. Please
>>>>>>     seewww.hw.ac.uk/researchleaders
>>>>>>     <http://www.hw.ac.uk/researchleaders>for further information
>>>>>>     and how to apply.
>>>>>>
>>>>>>     Heriot-Watt University is a Scottish charity registered under
>>>>>>     charity number SC000278.
>>>>>
>>>>
>>>>     -----------------------------------------------------------------
>>>>     |  Professor Abraham Bernstein, PhD
>>>>     |  University of Zürich, Department of Informatics
>>>>     |  web:http://www.ifi.uzh.ch/ddis/bernstein.html
>>>>
>>>
>>
>>
>>     -- 
>>     Alasdair J G Gray
>>     Lecturer, Heriot-Watt University
>>     Web: http://www.alasdairjggray.co.uk
>>     ORCID: http://orcid.org/0000-0002-5711-4872
>>     Twitter: @gray_alasdair
>>     Telephone: +44 131 451 3429
>>     Office: EM 1.39
>>
>>
>>
>>     We invite research leaders and ambitious early career researchers
>>     to join us in leading and driving research in key
>>     inter-disciplinary themes. Please see
>>     www.hw.ac.uk/researchleaders
>>     <http://www.hw.ac.uk/researchleaders> for further information and
>>     how to apply.
>>
>>     Heriot-Watt University is a Scottish charity registered under
>>     charity number SC000278.
>

Received on Wednesday, 17 June 2015 13:09:44 UTC