- From: Tara Athan <taraathan@gmail.com>
- Date: Thu, 3 Dec 2015 17:28:47 -0500
- To: "public-rsp@w3.org" <public-rsp@w3.org>
- Message-ID: <5660C21F.9010603@gmail.com>
In the example proposed by Alasdair in the last telecon, we would have
an RDF stream that contains timestamped observations of temperature
(Celsius) at a variety of locations. The desired output is a substream,
where (exactly) the observations that are less than 20 are included.
For brevity, I will make use of the following prefixes, without being
concerned at this point about the details of prefix definitions within
the RDF stream.
@prefix ex: <http://www.example.org/timestamp-vocabulary#> .
@prefix : <http://www.example.org/data-vocabulary#> .
Suppose the stream contains (at least) the following elements:
{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempC '19.8'^^xsd:decimal .
:Paris :hasDailyAverageTempC '17.3'^^xsd:decimal .}
{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Berlin :hasDailyAverageTempC '20.8'^^xsd:decimal .
:Paris :hasDailyAverageTempC '19.8'^^xsd:decimal .}
The expected output should be
{_1 ex:observedAt '2015-01-01'^^xsd:date.}
_1 {:Berlin :hasDailyAverageTempCLessThan '20'^^xsd:decimal .
:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
{_2 ex:observedAt '2015-02-01'^^xsd:date.}
_2 {:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
I would like to propose an option that is SPARQL-based. It relies on a
few assumptions:
1. The input and output RDF streams can be viewed as RDF datasets, with
preservation of semantics.
2. We accept the errata-query-15 at
http://www.w3.org/2013/sparql-errata#sparql11-query, which Andy Seaborne
was kind enough to record after a discussion on the sparql-dev mailing
list about the discrepancy between the definition of an RDF Dataset in
SPARQL 1.1 (http://www.w3.org/TR/sparql11-query/#rdfDataset) with that
of RDF (http://www.w3.org/TR/rdf11-concepts/#managing-graphs). He
confirms the view that the RDF 1.1 definition of RDF Dataset should take
precedence over the SPARQL 1.1 definition. This allows us to use blank
nodes as graph names.
3. We accept an extension of the SPARQL 1.1 language that allows the
template of a CONSTRUCT to specify an RDF dataset, following
https://jena.apache.org/documentation/query/construct-quad.html . I'm
not convinced their proposed modification to the ebnf is optimal, but
the actual syntax is a natural extension of the existing CONSTRUCT syntax.
The assumption #1 holds for this example, because there is only one
timestamp predicate used, and the timestamp temporal entities are
instants that are distinct (no repetition of the same time instant in
the stream).
Given these assumptions, we can apply the following (extended) SPARQL query
CONSTRUCT {
{?g ex:observedAt ?t}
GRAPH ?g {?s :hasDailyAverageTempCLessThan '20'^^xsd:decimal .}
}
WHERE {
{?g ex:observedAt ?t}
GRAPH ?g {?s :hasDailyAverageTempC ?o .}
FILTER ( ?o < '20'^^xsd:decimal ) .
}
We would apply this query to the "unified RDF dataset" of the input RDF
stream.
The result of this query would be again an RDF dataset, which could then
be viewed as the unified RDF dataset of the output RDF stream.
An alternate way to view this is that the SPARQL query is applied to
each element of the RDF stream individually.
I believe that most of the usecases that have been raised can be handled
(in part) by such extended-SPARQL queries, but I would like to have more
worked out examples, so it is possible to see where there might be a
hold in the approach.
On the other hand, I don't think we can allow an arbitrary CONSTRUCT
form to be used to query an RDF stream, because the result dataset might
not correspond to the RDF dataset of an RDF stream (e.g. inappropriate
triples in the default graph), or it may be impossible to evaluate the
query asynchronously (e.g. output timestamp temporal entities inversely
related to stream order).
Tara
Received on Thursday, 3 December 2015 22:27:11 UTC