Re: Example of filtering an RDF stream

good afternoon;

> On 2015-12-04, at 12:00, Robin Keskisärkkä <robin.keskisarkka@liu.se> wrote:
> 
> Hi all!
> For the way I interpret 2) I propose that the average results be placed in "new" graphs, since reusing the old graph URI:s may not make sense. Now, this particular query is slightly more complex and uses a nested select query to find the average temperature and the most recent timestamp for each city. The query assumes that subtraction of timestamps is supported and that it returns xsd:duration (this is e.g. the way it is implemented in Jena). Perhaps there are other interpretations that should be covered as well.

if you generate a result in which the graph is identified by a blank node, if it is other than ephemeral, by standard rdf semantics, you will need to add metadata if you intend to ever distinguish it from other results with analogous content.
if, on the other hand, it is ephemeral, it is not clear, why one would need to isolate it into its own graph - the graph identifier would be gratuitous and a triple field would suffice.

the benefit of reiterating the graph identifier is that the same characterizsation which generated it would suffice to locate it again.

> 
> Input stream:
> { _:1 ex:observedAt '2015-01-01'^^xsd:date. }
> _:1 { :Berlin :hasTempC '19.8'^^xsd:decimal .
>          :Paris :hasTempC '17.3'^^xsd:decimal . }
> 
> { _:2 ex:observedAt '2015-01-02'^^xsd:date. }
> _:2 { :Berlin :hasTempC '20.6'^^xsd:decimal .
>          :Paris :hasTempC '17.1'^^xsd:decimal . }
> 
> The expected output:
> { _:x ex:observedAt '2015-01-02'^^xsd:date. }
> _:x { :Berlin :hasDailyAverageTempC '20.2'^^xsd:decimal .
>         :Paris :hasDailyAverageTempC '17.2'^^xsd:decimal .}
> 
> Query:
> CONSTRUCT {
>    _:x ex:observedAt ?time .
>    GRAPH _:x { ?city :hasDailyAverageTempC ?avgTemp . }
> }
> WHERE {
>   { SELECT ?city (AVG(?temp) AS ?avgTemp) (MAX(?t) AS ?time)
>     WHERE {
>        ?g ex:observedAt ?t .
>        GRAPH ?g { ?city :hasTempC ?temp . }
>        FILTER( NOW() - ?t  <= "PT5M00.000S"^^xsd:duration) )
>     }
>     GROUP BY ?city
>   }
> }
> 
> 
> Best regards,
> 
> Robin Keskisärkkä
> PhD Student
> 
> 
> Department
> 581 83 Linköping
> Mobile: +46 (0)70 49 09 179
> Please visit us at www.liu.se <http://www.liu.se/>
> 
> From: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>>
> Date: Friday 4 December 2015 11:13
> To: Tara Athan <taraathan@gmail.com <mailto:taraathan@gmail.com>>
> Cc: "public-rsp@w3.org <mailto:public-rsp@w3.org>" <public-rsp@w3.org <mailto:public-rsp@w3.org>>
> Subject: Re: Example of filtering an RDF stream
> Resent-From: <public-rsp@w3.org <mailto:public-rsp@w3.org>>
> Resent-Date: Friday 4 December 2015 11:14
> 
> Tara,
> 
> Thanks for working this through. Your solution initially had me confused by changing the predicate in the output graph, but in reading the whole email it became apparent.
> 
> The next two issues we’ll need to tackle are:
> What is the output type if we just have a SELECT clause?

if you intend to produce a graph, ”just a select clause" is not adequate.

> What is the answer to the query that returns the average over the last 5 minutes?
> 
> For (2) we may need to amend Tara’s example to give individual temperature readings rather than average readings (just so the math makes sense).
> 
> Alasdair
> 
> 
>> On 3 Dec 2015, at 22:28, Tara Athan <taraathan@gmail.com <mailto:taraathan@gmail.com>> wrote:
>> 
>> In the example proposed by Alasdair in the last telecon, we would have an RDF stream that contains timestamped observations of temperature (Celsius) at a variety of locations. The desired output is a substream, where (exactly) the observations that are less than 20 are included.
>> 
>> For brevity, I will make use of the following prefixes, without being concerned at this point about the details of prefix definitions within the RDF stream.
>> @prefix ex: <http://www.example.org/timestamp-vocabulary#> <http://www.example.org/timestamp-vocabulary#> .
>> @prefix : <http://www.example.org/data-vocabulary#> <http://www.example.org/data-vocabulary#> .
>> 
>> Suppose the stream contains (at least) the following elements:
>> 
>> {_1 ex:observedAt '2015-01-01'^^xsd:date.}
>> _1 {:Berlin :hasDailyAverageTempC '19.8'^^xsd:decimal .
>>      :Paris :hasDailyAverageTempC '17.3'^^xsd:decimal .}
>> 
>> {_2 ex:observedAt '2015-02-01'^^xsd:date.}
>> _2 {:Berlin :hasDailyAverageTempC '20.8'^^xsd:decimal .
>>      :Paris :hasDailyAverageTempC '19.8'^^xsd:decimal .}
>> 
>> 
>> The expected output should be
>> 
>> {_1 ex:observedAt '2015-01-01'^^xsd:date.}
>> _1 {:Berlin :hasDailyAverageTempCLessThan '20'^^xsd:decimal .
>>      :Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
>>   
>> {_2 ex:observedAt '2015-02-01'^^xsd:date.}
>> _2 {:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
>> 
>> 
>> I would like to propose an option that is SPARQL-based. It relies on a few assumptions:
>> 
>> 1. The input and output RDF streams can be viewed as RDF datasets, with preservation of semantics.
>> 
>> 2. We accept the errata-query-15 at <> http://www.w3.org/2013/sparql-errata#sparql11-query <http://www.w3.org/2013/sparql-errata#sparql11-query>, which Andy Seaborne was kind enough to record after a discussion on the sparql-dev mailing list about the discrepancy between the definition of an RDF Dataset  in SPARQL 1.1 (http://www.w3.org/TR/sparql11-query/#rdfDataset <http://www.w3.org/TR/sparql11-query/#rdfDataset>) with that of RDF (http://www.w3.org/TR/rdf11-concepts/#managing-graphs <http://www.w3.org/TR/rdf11-concepts/#managing-graphs>). He confirms the view that the RDF 1.1 definition of RDF Dataset should take precedence over the SPARQL 1.1 definition. This allows us to use blank nodes as graph names.
>> 
>> 3. We accept an extension of the SPARQL 1.1 language that allows the template of a CONSTRUCT to specify an RDF dataset, followinghttps://jena.apache.org/documentation/query/construct-quad.html <https://jena.apache.org/documentation/query/construct-quad.html> . I'm not convinced their proposed modification to the ebnf is optimal, but the actual syntax is a natural extension of the existing CONSTRUCT syntax.
>> 
>> The assumption #1 holds for this example, because there is only one timestamp predicate used, and the timestamp temporal entities are instants that are distinct (no repetition of the same time instant in the stream).
>> 
>> Given these assumptions, we can apply the following (extended) SPARQL query
>> 
>> CONSTRUCT {
>>   {?g ex:observedAt ?t}
>>   GRAPH ?g {?s :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
>> }
>> WHERE {
>>   {?g ex:observedAt ?t}
>>    GRAPH ?g {?s :hasDailyAverageTempC ?o .}
>>    FILTER ( ?o < '20'^^xsd:decimal ) .
>> }
>> 
>> We would apply this query to the "unified RDF dataset" of the input RDF stream.
>> The result of this query would be again an RDF dataset, which could then be viewed as the unified RDF dataset of the output RDF stream.
>> 
>> An alternate way to view this is that the SPARQL query is applied to each element of the RDF stream individually.
>> 
>> I believe that most of the usecases that have been raised can be handled (in part) by such extended-SPARQL queries, but I would like to have more worked out examples, so it is possible to see where there might be a hold in the approach.
>> 
>> On the other hand, I don't think we can allow an arbitrary CONSTRUCT form to be used to query an RDF stream, because the result dataset might not correspond to the RDF dataset of an RDF stream (e.g. inappropriate triples in the default graph), or it may be impossible to evaluate the query asynchronously (e.g. output timestamp temporal entities inversely related to stream order).
>> 
>> Tara
> 
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science, 
> School of Mathematical and Computer Sciences 
> (Athena SWAN Bronze Award)
> Heriot-Watt University, Edinburgh UK.
> 
> Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
> Web: http://www.alasdairjggray.co.uk <http://www.alasdairjggray.co.uk/>
> ORCID: http://orcid.org/0000-0002-5711-4872 <http://orcid.org/0000-0002-5711-4872>
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply. 
> 
> Heriot-Watt University is a Scottish charity registered under charity number SC000278. 


best regards, from berlin,
---
james anderson | james@dydra.com | http://dydra.com

Received on Friday, 4 December 2015 11:33:03 UTC