Example of filtering an RDF stream

In the example proposed by Alasdair in the last telecon, we would have 
an RDF stream that contains timestamped observations of temperature 
(Celsius) at a variety of locations. The desired output is a substream, 
where (exactly) the observations that are less than 20 are included.

For brevity, I will make use of the following prefixes, without being 
concerned at this point about the details of prefix definitions within 
the RDF stream.
@prefix ex: <http://www.example.org/timestamp-vocabulary#> .
@prefix : <http://www.example.org/data-vocabulary#> .

Suppose the stream contains (at least) the following elements:

{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempC '19.8'^^xsd:decimal .
      :Paris :hasDailyAverageTempC '17.3'^^xsd:decimal .}

{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Berlin :hasDailyAverageTempC '20.8'^^xsd:decimal .
      :Paris :hasDailyAverageTempC '19.8'^^xsd:decimal .}


The expected output should be

{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempCLessThan '20'^^xsd:decimal .
      :Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}

{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}


I would like to propose an option that is SPARQL-based. It relies on a 
few assumptions:

1. The input and output RDF streams can be viewed as RDF datasets, with 
preservation of semantics.

2. We accept the errata-query-15 at 
http://www.w3.org/2013/sparql-errata#sparql11-query, which Andy Seaborne 
was kind enough to record after a discussion on the sparql-dev mailing 
list about the discrepancy between the definition of an RDF Dataset  in 
SPARQL 1.1 (http://www.w3.org/TR/sparql11-query/#rdfDataset) with that 
of RDF (http://www.w3.org/TR/rdf11-concepts/#managing-graphs). He 
confirms the view that the RDF 1.1 definition of RDF Dataset should take 
precedence over the SPARQL 1.1 definition. This allows us to use blank 
nodes as graph names.

3. We accept an extension of the SPARQL 1.1 language that allows the 
template of a CONSTRUCT to specify an RDF dataset, following 
https://jena.apache.org/documentation/query/construct-quad.html . I'm 
not convinced their proposed modification to the ebnf is optimal, but 
the actual syntax is a natural extension of the existing CONSTRUCT syntax.

The assumption #1 holds for this example, because there is only one 
timestamp predicate used, and the timestamp temporal entities are 
instants that are distinct (no repetition of the same time instant in 
the stream).

Given these assumptions, we can apply the following (extended) SPARQL query

CONSTRUCT {
   {?g ex:observedAt ?t}
   GRAPH ?g {?s :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
}
WHERE {
   {?g ex:observedAt ?t}
GRAPH ?g {?s :hasDailyAverageTempC ?o .}
    FILTER ( ?o < '20'^^xsd:decimal ) .
}

We would apply this query to the "unified RDF dataset" of the input RDF 
stream.
The result of this query would be again an RDF dataset, which could then 
be viewed as the unified RDF dataset of the output RDF stream.

An alternate way to view this is that the SPARQL query is applied to 
each element of the RDF stream individually.

I believe that most of the usecases that have been raised can be handled 
(in part) by such extended-SPARQL queries, but I would like to have more 
worked out examples, so it is possible to see where there might be a 
hold in the approach.

On the other hand, I don't think we can allow an arbitrary CONSTRUCT 
form to be used to query an RDF stream, because the result dataset might 
not correspond to the RDF dataset of an RDF stream (e.g. inappropriate 
triples in the default graph), or it may be impossible to evaluate the 
query asynchronously (e.g. output timestamp temporal entities inversely 
related to stream order).

Tara

Received on Thursday, 3 December 2015 22:27:11 UTC