Re: Example of filtering an RDF stream from Gray, Alasdair J G on 2015-12-04 (public-rsp@w3.org from December 2015)

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Fri, 4 Dec 2015 09:13:47 +0000
To: Tara Athan <taraathan@gmail.com>
CC: "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <ED70CA7F-A56F-4AAC-85D6-F67F31ED7BE5@hw.ac.uk>
Tara,

Thanks for working this through. Your solution initially had me confused by changing the predicate in the output graph, but in reading the whole email it became apparent.

The next two issues we’ll need to tackle are:

  1.  What is the output type if we just have a SELECT clause?
  2.  What is the answer to the query that returns the average over the last 5 minutes?

For (2) we may need to amend Tara’s example to give individual temperature readings rather than average readings (just so the math makes sense).

Alasdair


On 3 Dec 2015, at 22:28, Tara Athan <taraathan@gmail.com<mailto:taraathan@gmail.com>> wrote:

In the example proposed by Alasdair in the last telecon, we would have an RDF stream that contains timestamped observations of temperature (Celsius) at a variety of locations. The desired output is a substream, where (exactly) the observations that are less than 20 are included.

For brevity, I will make use of the following prefixes, without being concerned at this point about the details of prefix definitions within the RDF stream.
@prefix ex: <http://www.example.org/timestamp-vocabulary#><http://www.example.org/timestamp-vocabulary#> .
@prefix : <http://www.example.org/data-vocabulary#><http://www.example.org/data-vocabulary#> .

Suppose the stream contains (at least) the following elements:

{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempC '19.8'^^xsd:decimal .
     :Paris :hasDailyAverageTempC '17.3'^^xsd:decimal .}

{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Berlin :hasDailyAverageTempC '20.8'^^xsd:decimal .
     :Paris :hasDailyAverageTempC '19.8'^^xsd:decimal .}


The expected output should be

{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempCLessThan '20'^^xsd:decimal .
     :Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}

{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}


I would like to propose an option that is SPARQL-based. It relies on a few assumptions:

1. The input and output RDF streams can be viewed as RDF datasets, with preservation of semantics.

2. We accept the errata-query-15 at http://www.w3.org/2013/sparql-errata#sparql11-query, which Andy Seaborne was kind enough to record after a discussion on the sparql-dev mailing list about the discrepancy between the definition of an RDF Dataset  in SPARQL 1.1 (http://www.w3.org/TR/sparql11-query/#rdfDataset) with that of RDF (http://www.w3.org/TR/rdf11-concepts/#managing-graphs). He confirms the view that the RDF 1.1 definition of RDF Dataset should take precedence over the SPARQL 1.1 definition. This allows us to use blank nodes as graph names.

3. We accept an extension of the SPARQL 1.1 language that allows the template of a CONSTRUCT to specify an RDF dataset, following https://jena.apache.org/documentation/query/construct-quad.html . I'm not convinced their proposed modification to the ebnf is optimal, but the actual syntax is a natural extension of the existing CONSTRUCT syntax.

The assumption #1 holds for this example, because there is only one timestamp predicate used, and the timestamp temporal entities are instants that are distinct (no repetition of the same time instant in the stream).

Given these assumptions, we can apply the following (extended) SPARQL query

CONSTRUCT {
  {?g ex:observedAt ?t}
  GRAPH ?g {?s :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
}
WHERE {
  {?g ex:observedAt ?t}
   GRAPH ?g {?s :hasDailyAverageTempC ?o .}
   FILTER ( ?o < '20'^^xsd:decimal ) .
}

We would apply this query to the "unified RDF dataset" of the input RDF stream.
The result of this query would be again an RDF dataset, which could then be viewed as the unified RDF dataset of the output RDF stream.

An alternate way to view this is that the SPARQL query is applied to each element of the RDF stream individually.

I believe that most of the usecases that have been raised can be handled (in part) by such extended-SPARQL queries, but I would like to have more worked out examples, so it is possible to see where there might be a hold in the approach.

On the other hand, I don't think we can allow an arbitrary CONSTRUCT form to be used to query an RDF stream, because the result dataset might not correspond to the RDF dataset of an RDF stream (e.g. inappropriate triples in the default graph), or it may be impossible to evaluate the query asynchronously (e.g. output timestamp temporal entities inversely related to stream order).

Tara

Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.alasdairjggray.co.uk

ORCID: http://orcid.org/0000-0002-5711-4872

Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair












----- 
We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.
Received on Friday, 4 December 2015 09:14:54 UTC