- From: David Booth <david@dbooth.org>
- Date: Sat, 06 Apr 2013 17:18:35 -0400
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Dave Reynolds <dave.e.reynolds@gmail.com>, semantic-web@w3.org
On 04/06/2013 01:21 PM, Eric Prud'hommeaux wrote: > What > we'd like for "validation" is for JJC to label his notion of ice cream > flavors and someone else to extend it in a way that a 3rd party can > can accept amanda:Chocolate but reject jjc:Choco999late. Any candidates > or starting points? I favor the approach of providing validation tests as a set of SPARQL queries against the RDF data: - The simplest form would be to use an ASK query, which returns a true/false value, to indicate whether the test passed or failed. ASK is good for verifying the presence of expected data. - For constraint checking, a better form is to use a CONSTRUCT query, using the SPIN constraint checking style: http://spinrdf.org/spin.html#spin-constraint-construct CONSTRUCT is better for this because it can return information about the reason why the test failed, which is very helpful for debugging purposes. If the CONSTRUCT query returns nothing There are big benefits in using RDF and SPARQL for this purpose: - The tests are resilient to the presence of extra information. This means that additional data, vocabularies and ontologies can be mixed in, without affecting existing information access or tests. - All tests are written in the same, common language, regardless of the underlying data model that they test. This makes it very easy to share and deploy new tests. - Different constraints can be defined for different purposes, and kept separate from the data. It is helpful to break validation into two kinds, depending on one's role as data producer or data consumer. Quoting from "RDF and SOA", these two kinds of validation are: http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm#data-validation it : [[ - Model integrity (defined by the producer). This is to ensure that the instance makes sense: that it conforms to the producer's intent, which in part may be constrained by contractual obligations to consumers. Since a data producer is responsible for generating the data it sends, it should supply a way to check model integrity. This validator may be useful to both producers and consumers. However, because the model may change over time (as it is versioned), the consumer must be sure to use the correct model integrity validator for the instance data at hand -- not a validator intended for some other version -- which means that the instance data should indicate the model-integrity validator under which it was created. - Suitability for use (defined by the consumer). This depends on the consuming application, so it will differ between producer and consumer and between different consumers. Since only the data consumer really knows how it will use the data it receives, it should supply a way to check suitability for use. This may also include integrity checks that are essential to this consumer, but to avoid unnecessary coupling it should avoid any other checks. ]] Thus, different suitability-for-use checks can be defined by different data consumers. To my mind, this SPARQL-based approach is much more flexible than an OWL-like approach. David Booth -
Received on Saturday, 6 April 2013 21:19:07 UTC