- From: Holger Knublauch <holger@topquadrant.com>
- Date: Mon, 13 Apr 2015 10:01:21 +1000
- To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
- Message-ID: <552B0751.2030600@topquadrant.com>
One of the main selling points of RDF technology has always been the fact that instance and schema are represented uniformly. RDF Schema and OWL class definitions are instances (of metaclasses) themselves. This means that such data can not only be stored and shared together, but also be queried uniformly. In general, SPARQL queries can freely walk between meta-levels. Many other formalisms such as XML and SQL databases have a stricter separation between those levels. If we agree on a similarly strict separation by making it impossible to query the shapes graph from the instances graph (and vice versa), then we may throw away a unique advantage that RDF technology has. I am generally not in favor of selecting the lowest common denominator for all use cases, only because certain cases may not have the best performance. I understand that we need to maintain good performance, including the ability to use native query optimizations on database level where possible. Also there are cases where the shapes model is really totally separate from the database. Yet I believe there are also cases where being able to access the shapes definitions at runtime is beneficial. In this discussion here, I believe we should distinguish between what we use in the SPARQL queries of the specification versus what optimized implementations may do. I believe it should be doable to assume that - in the context of the spec - the shapes graph can be in the same dataset as the actual data. So by default we would have a single dataset and validation gets two parameters: - the URI of the "instances" data graph (default graph) - the URI of the shapes graph An example of how this would work, with a single query is the body of sh:allowedValues: SELECT ?this (?this AS ?subject) ?predicate ?object WHERE { ?this ?predicate ?object . FILTER NOT EXISTS { GRAPH ?shapesGraph { ?allowedValues (rdf:rest*)/rdf:first ?object . } } } If the instances graph is in fact a remote database, then there are two ways to access it a) via a proxy graph API (as jena would do it by default) b) generate queries and send them to the end point directly In case b), queries could no longer access the shapes graph, so they would need to include enough information to be self-contained. For all built-in core elements, this should be easy. Just replace the GRAPH ?shapesGraph above with a FILTER NOT IN ..., and for sh:shape create a large nested query, same for OrConstraint and closed shapes (if we support these). But these things could be regarded as optimizations that any engine can implement itself, just like most engines may optimize certain recurring patterns and hard-code them instead of relying on the provided official SPARQL queries of their templates. Any other custom constraint that needs to access the shapes graph can be executed via mechanism a). This may mean that its performance may not be ideal, yet at least we have a simpler job of writing the spec and maintain improved flexibility for those (many) users that have shapes and data graphs on the same database. Summary: Generally allow to use ?shapesGraph at runtime while making sure that optimizations remain possible for the majority of use cases. Dimitris, would this help as an approximation? I can elaborate if you like, or we could talk off-list (I am easy to find on Skype). Regards, Holger On 4/10/2015 17:20, Dimitris Kontokostas wrote: > > > On Fri, Apr 10, 2015 at 10:01 AM, Holger Knublauch > <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote: > > BTW another example of a constraint where the WHERE clause would > benefit from querying the shapes graph itself is Closed Shapes. > These could be modeled using > > ex:MyShape > sh:property [ > ... > ] ; > sh:constraint [ > a sh:ClosedShapeConstraint . > ] > > where sh:ClosedShapeConstraint would walk the definition of > sh:MyShape (and possibly its super-shapes) to collect all > sh:predicates that are used. Then check that the instance has no > property that is not among those predicates. > > > Again this is an implementation optimization. The engine could > prebuild an additional query based on the shape definition in advance. > Of course this also depends on the semantics of the closed shapes. > see an example in > https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Mar/0080.html > > I believe the opportunities here are great and we shouldn't limit > such scenarios to emerge, one way or another. With a generic > solution anyone could define variations of things like Closed > Shapes themselves in their own macro library. > > > For me it is fine to have a generic solution as long as this solution > works in all cases. > Revised proposed resolution:Shapes and data are expected to exist in > different graphs unless specified specified otherwise and access from > the shapes graph to the data graph and vice-versa is not required. > > Would anyone object to this? > > Best, > Dimitris > > > > Holger > > > On 4/10/15 4:35 PM, Dimitris Kontokostas wrote: >> >> >> On Fri, Apr 10, 2015 at 8:19 AM, Holger Knublauch >> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote: >> >> On 4/10/2015 15:12, Dimitris Kontokostas wrote: >> >> >> I think you are referring to sh:valueShape and the >> sh:hasShape(?shape) function right? I don't see any other >> case that could be problematic. >> >> >> Also sh:OrConstraint (or any similar template that we or >> users may want to add, such as negation and intersection). >> >> >> Why can't we move these into the validation engine? e.g. (SPARQL >> Q1) or/xor/... (SPARQL Q2) >> >> And sh:allowedValues (which take a list or set of values, and >> those must reside somewhere, I guess they should reside with >> the shapes) - more general any template that takes rdf:List >> arguments that need to be walked at runtime. >> >> >> These should indeed reside in the shapes graph(s). >> Implementations could either pre-build the queries or build them >> at run-time. >> When we are working on immutable datasets (i.e. endpoints) >> pre-building the values in the queries would be the only option. >> Implementations with other use cases could optimize this. >> >> >> In this case, I was waiting for some clear definition for >> recursion in order to make a proposal but I think we have >> many options to go with. >> For example: If the data and the constraints are in the >> same graph we can use the sh:hasShape() function you >> propose, otherwise use algorithm X to execute the ShEx >> validation in multiple steps or Algorithm Y to convert >> the ShEx shape into a (giant) SPARQL query similar to the >> ShEx 2 SPARQL [1]. >> >> >> I don't think we should limit ourselves to the hard-coded >> built-ins of "ShEx" here - this should work with any >> user-defined template/macro too. >> >> If recursion is forbidden, things get much simpler and >> maybe - I need to work on this first to say for sure - >> ShEx shapes could be just treated as class shapes with an >> extra SPARQL filter. >> >> We need to have a clear definition of the ShEx shapes to >> see our options and we shouldn't limit the language >> design in advance. >> >> Proposed resolution:Shapes and data are expected to exist >> in different graphs unless specified specified otherwise >> >> >> Agreed. In some cases the graph called the shapes graph could >> be identical with the data graph though - it would just be >> accessed via a magic named graph name or GRAPH ?variable. >> >> >> Indeed, the user could specify that they are identical in many >> cases and implementations can optimize execution in these cases, >> But I think 'GRAPH ?variable' is an implementation detail, the >> spec should assume that the data graph cannot access the shapes >> graph - or provide alternative(s) >> >> >> >> Holger >> >> >> >> >> >> -- >> Dimitris Kontokostas >> Department of Computer Science, University of Leipzig & DBpedia >> Association >> Projects: http://dbpedia.org, http://http://aligned-project.eu >> Homepage:http://aksw.org/DimitrisKontokostas >> Research Group: http://aksw.org >> > > > > > -- > Dimitris Kontokostas > Department of Computer Science, University of Leipzig & DBpedia > Association > Projects: http://dbpedia.org <http://dbpedia.org>, > http://http://aligned-project.eu <http://aligned-project.eu> > Homepage:http://aksw.org/DimitrisKontokostas > <http://aksw.org/DimitrisKontokostas> > Research Group: http://aksw.org <http://aksw.org> >
Received on Monday, 13 April 2015 00:02:52 UTC