- From: Tara Athan <taraathan@gmail.com>
- Date: Thu, 3 Dec 2015 17:28:47 -0500
- To: "public-rsp@w3.org" <public-rsp@w3.org>
- Message-ID: <5660C21F.9010603@gmail.com>
In the example proposed by Alasdair in the last telecon, we would have an RDF stream that contains timestamped observations of temperature (Celsius) at a variety of locations. The desired output is a substream, where (exactly) the observations that are less than 20 are included. For brevity, I will make use of the following prefixes, without being concerned at this point about the details of prefix definitions within the RDF stream. @prefix ex: <http://www.example.org/timestamp-vocabulary#> . @prefix : <http://www.example.org/data-vocabulary#> . Suppose the stream contains (at least) the following elements: {_1 ex:observedAt '2015-01-01'^^xsd:date.} _1 {:Berlin :hasDailyAverageTempC '19.8'^^xsd:decimal . :Paris :hasDailyAverageTempC '17.3'^^xsd:decimal .} {_2 ex:observedAt '2015-02-01'^^xsd:date.} _2 {:Berlin :hasDailyAverageTempC '20.8'^^xsd:decimal . :Paris :hasDailyAverageTempC '19.8'^^xsd:decimal .} The expected output should be {_1 ex:observedAt '2015-01-01'^^xsd:date.} _1 {:Berlin :hasDailyAverageTempCLessThan '20'^^xsd:decimal . :Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .} {_2 ex:observedAt '2015-02-01'^^xsd:date.} _2 {:Paris :hasDailyAverageTempCLessThan '20'^^xsd:decimal .} I would like to propose an option that is SPARQL-based. It relies on a few assumptions: 1. The input and output RDF streams can be viewed as RDF datasets, with preservation of semantics. 2. We accept the errata-query-15 at http://www.w3.org/2013/sparql-errata#sparql11-query, which Andy Seaborne was kind enough to record after a discussion on the sparql-dev mailing list about the discrepancy between the definition of an RDF Dataset in SPARQL 1.1 (http://www.w3.org/TR/sparql11-query/#rdfDataset) with that of RDF (http://www.w3.org/TR/rdf11-concepts/#managing-graphs). He confirms the view that the RDF 1.1 definition of RDF Dataset should take precedence over the SPARQL 1.1 definition. This allows us to use blank nodes as graph names. 3. We accept an extension of the SPARQL 1.1 language that allows the template of a CONSTRUCT to specify an RDF dataset, following https://jena.apache.org/documentation/query/construct-quad.html . I'm not convinced their proposed modification to the ebnf is optimal, but the actual syntax is a natural extension of the existing CONSTRUCT syntax. The assumption #1 holds for this example, because there is only one timestamp predicate used, and the timestamp temporal entities are instants that are distinct (no repetition of the same time instant in the stream). Given these assumptions, we can apply the following (extended) SPARQL query CONSTRUCT { {?g ex:observedAt ?t} GRAPH ?g {?s :hasDailyAverageTempCLessThan '20'^^xsd:decimal .} } WHERE { {?g ex:observedAt ?t} GRAPH ?g {?s :hasDailyAverageTempC ?o .} FILTER ( ?o < '20'^^xsd:decimal ) . } We would apply this query to the "unified RDF dataset" of the input RDF stream. The result of this query would be again an RDF dataset, which could then be viewed as the unified RDF dataset of the output RDF stream. An alternate way to view this is that the SPARQL query is applied to each element of the RDF stream individually. I believe that most of the usecases that have been raised can be handled (in part) by such extended-SPARQL queries, but I would like to have more worked out examples, so it is possible to see where there might be a hold in the approach. On the other hand, I don't think we can allow an arbitrary CONSTRUCT form to be used to query an RDF stream, because the result dataset might not correspond to the RDF dataset of an RDF stream (e.g. inappropriate triples in the default graph), or it may be impossible to evaluate the query asynchronously (e.g. output timestamp temporal entities inversely related to stream order). Tara
Received on Thursday, 3 December 2015 22:27:11 UTC