- From: Holger Knublauch <holger@topquadrant.com>
- Date: Fri, 10 Apr 2015 09:32:04 +1000
- To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
On 4/10/2015 0:33, Dimitris Kontokostas wrote: > What is the point of supporting SPARQL if we cannot support SPARQL > endpoints? I thought one of the goals of having SPARQL as a syntax in > SHACL was to be able to move away from the RAM limitations and get the > benefits of the SPARQL query optimizations. > Jena is a great tool and I use it a lot but when the dataset gets big > (a few GB) the validation speed gets exponentially slower compared to > Virtuoso. This depends on how to use a library such as Jena. Jena does have its own SPARQL engine (ARQ) built-in, which by default works on basically every database by opening triple iterators and then doing all the FILTER processing in "client" memory. However, Jena can also be used in a way that it sends complete Queries to a remote database, e.g. with a SPARQL end point (create custom Query and QueryExecution objects, and there are other hooks on the Algebra layer). From a specification's point of view this seems to be just an implementation detail. The spec only specifies that it must have a dataset and then when a query gets executed on a given named graph, it's up to the implementation to decide how to execute that query - it may just pass it on to the database in a single transaction and thus use all the native goodies of that database. > > There is already user story 34 that captures this need and I could > many others if needed. I am not doubting this, and of course supporting databases is very important! > > Nevertheless, a SPARQL endpoint can be considered as an RDF dataset > and named graphs can indeed be used to separate constraints and data. > > What needs to be defined by the WG is not the support of SPARQL > endpoints but if the constraints and the data MUST be on the same > dataset or not if they can exist in separate datasets. > The fact that Jena can merge two dataset in memory is just an > implementation optimization IMHO. Agreed. And I currently don't see how we could support multiple datasets. I currently experiment with a design based on named graphs, similar to what Richard proposed - either via a variable GRAPH ?shapesGraph or a dedicated special URI that would be used via GRAPH sh:ShapesGraph. Some queries currently do indeed need to jump back and forth between the default graph and that "Shapes Graph". There are multiple options on how to address these scenarios, but it is not clear to me yet what design will work best overall. Holger
Received on Thursday, 9 April 2015 23:33:29 UTC