- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sat, 6 Apr 2013 19:41:42 -0400
- To: David Booth <david@dbooth.org>
- Cc: Dave Reynolds <dave.e.reynolds@gmail.com>, semantic-web@w3.org
* David Booth <david@dbooth.org> [2013-04-06 17:18-0400] > On 04/06/2013 01:21 PM, Eric Prud'hommeaux wrote: > >What > >we'd like for "validation" is for JJC to label his notion of ice cream > >flavors and someone else to extend it in a way that a 3rd party can > >can accept amanda:Chocolate but reject jjc:Choco999late. Any candidates > >or starting points? > > I favor the approach of providing validation tests as a set of > SPARQL queries against the RDF data: > > - The simplest form would be to use an ASK query, which returns a > true/false value, to indicate whether the test passed or failed. > ASK is good for verifying the presence of expected data. > > - For constraint checking, a better form is to use a CONSTRUCT > query, using the SPIN constraint checking style: > http://spinrdf.org/spin.html#spin-constraint-construct > CONSTRUCT is better for this because it can return information about > the reason why the test failed, which is very helpful for debugging > purposes. If the CONSTRUCT query returns nothing You can persuade SELECTs to do the same <http://www.w3.org/2012/12/rdf-val/SOTA#sparql> but it's a real pain to be thorough. On the bright side, the results <http://www.w3.org/2012/12/rdf-val/SOTA#sparqlValidRes> are quite tabular and easy to read when validating lots of resources. I'd like to include SPIN examples in the document above if you want submit some. > There are big benefits in using RDF and SPARQL for this purpose: > > - The tests are resilient to the presence of extra information. > This means that additional data, vocabularies and ontologies can be > mixed in, without affecting existing information access or tests. > > - All tests are written in the same, common language, regardless of > the underlying data model that they test. This makes it very easy > to share and deploy new tests. > > - Different constraints can be defined for different purposes, and > kept separate from the data. It is helpful to break validation into > two kinds, depending on one's role as data producer or data > consumer. Quoting from "RDF and SOA", these two kinds of validation > are: > http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm#data-validation > it : > [[ > - Model integrity (defined by the producer). This is to ensure > that the instance makes sense: that it conforms to the producer's > intent, which in part may be constrained by contractual obligations > to consumers. Since a data producer is responsible for generating > the data it sends, it should supply a way to check model integrity. > This validator may be useful to both producers and consumers. > However, because the model may change over time (as it is > versioned), the consumer must be sure to use the correct model > integrity validator for the instance data at hand -- not a validator > intended for some other version -- which means that the instance > data should indicate the model-integrity validator under which it > was created. > > - Suitability for use (defined by the consumer). This depends on > the consuming application, so it will differ between producer and > consumer and between different consumers. Since only the data > consumer really knows how it will use the data it receives, it > should supply a way to check suitability for use. This may also > include integrity checks that are essential to this consumer, but to > avoid unnecessary coupling it should avoid any other checks. > ]] > > Thus, different suitability-for-use checks can be defined by > different data consumers. > > To my mind, this SPARQL-based approach is much more flexible than an > OWL-like approach. I believe that it's easier to ensure completenes if you have a declarative description like DC's Application Profile or IBM Resource Shapes. <http://www.w3.org/2012/12/rdf-val/SOTA#shapes> A tool can enforce the rules by writing SPARQL or SPIN rules. you can avoid a lot of opportunities for mistakes if you find the right expressivity for the constraints language. > David Booth > > - -- -ericP
Received on Saturday, 6 April 2013 23:42:16 UTC