Re: SKOS, controlled vocab, and open world assumption

* David Booth <david@dbooth.org> [2013-04-06 17:18-0400]
> On 04/06/2013 01:21 PM, Eric Prud'hommeaux wrote:
> >What
> >we'd like for "validation" is for JJC to label his notion of ice cream
> >flavors and someone else to extend it in a way that a 3rd party can
> >can accept amanda:Chocolate but reject jjc:Choco999late. Any candidates
> >or starting points?
> 
> I favor the approach of providing validation tests as a set of
> SPARQL queries against the RDF data:
> 
>  - The simplest form would be to use an ASK query, which returns a
> true/false value, to indicate whether the test passed or failed.
> ASK is good for verifying the presence of expected data.
> 
>  - For constraint checking, a better form is to use a CONSTRUCT
> query, using the SPIN constraint checking style:
> http://spinrdf.org/spin.html#spin-constraint-construct
> CONSTRUCT is better for this because it can return information about
> the reason why the test failed, which is very helpful for debugging
> purposes.  If the CONSTRUCT query returns nothing

You can persuade SELECTs to do the same
<http://www.w3.org/2012/12/rdf-val/SOTA#sparql>
but it's a real pain to be thorough. On the bright side, the results
<http://www.w3.org/2012/12/rdf-val/SOTA#sparqlValidRes>
are quite tabular and easy to read when validating lots of resources.

I'd like to include SPIN examples in the document above if you want
submit some.


> There are big benefits in using RDF and SPARQL for this purpose:
> 
>  - The tests are resilient to the presence of extra information.
> This means that additional data, vocabularies and ontologies can be
> mixed in, without affecting existing information access or tests.
> 
>  - All tests are written in the same, common language, regardless of
> the underlying data model that they test.  This makes it very easy
> to share and deploy new tests.
> 
>  - Different constraints can be defined for different purposes, and
> kept separate from the data.  It is helpful to break validation into
> two kinds, depending on one's role as data producer or data
> consumer. Quoting from "RDF and SOA", these two kinds of validation
> are:
> http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm#data-validation
> it :
> [[
>  - Model integrity (defined by the producer).  This is to ensure
> that the instance makes sense: that it conforms to the producer's
> intent, which in part may be constrained by contractual obligations
> to consumers.  Since a data producer is responsible for generating
> the data it sends, it should supply a way to check model integrity.
> This validator may be useful to both producers and consumers.
> However, because the model may change over time (as it is
> versioned), the consumer must be sure to use the correct model
> integrity validator for the instance data at hand -- not a validator
> intended for some other version -- which means that the instance
> data should indicate the model-integrity validator under which it
> was created.
> 
>  - Suitability for use (defined by the consumer).  This depends on
> the consuming application, so it will differ between producer and
> consumer and between different consumers.  Since only the data
> consumer really knows how it will use the data it receives, it
> should supply a way to check suitability for use.  This may also
> include integrity checks that are essential to this consumer, but to
> avoid unnecessary coupling it should avoid any other checks.
> ]]
> 
> Thus, different suitability-for-use checks can be defined by
> different data consumers.
> 
> To my mind, this SPARQL-based approach is much more flexible than an
> OWL-like approach.

I believe that it's easier to ensure completenes if you have a
declarative description like DC's Application Profile or IBM Resource
Shapes.
<http://www.w3.org/2012/12/rdf-val/SOTA#shapes>
A tool can enforce the rules by writing SPARQL or SPIN rules. you can
avoid a lot of opportunities for mistakes if you find the right
expressivity for the constraints language.


> David Booth
> 
>  -

-- 
-ericP

Received on Saturday, 6 April 2013 23:42:16 UTC