Re: SKOS, controlled vocab, and open world assumption

Thanks for the input

my actual use case is with bio vocabularies, where for instance:

+ we install a copy of a current version of a web-based vocabulary of diseases say, into our application
+ a customer using their copy, comes across another disease and has to add it in
+ later the web-based vocab is updated with the new disease, we update our copy, and the customer application, and we want some sort of managed process by which the customer copy of the disease term gets dropped in favor of the new standard way of referring to the same disease

While I suspect one can do the same with ice cream flavors, it is perhaps more real to ground in biomedical terminology.



Jeremy J Carroll
Principal Architect
Syapse, Inc.



On Apr 6, 2013, at 4:41 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * David Booth <david@dbooth.org> [2013-04-06 17:18-0400]
>> On 04/06/2013 01:21 PM, Eric Prud'hommeaux wrote:
>>> What
>>> we'd like for "validation" is for JJC to label his notion of ice cream
>>> flavors and someone else to extend it in a way that a 3rd party can
>>> can accept amanda:Chocolate but reject jjc:Choco999late. Any candidates
>>> or starting points?
>> 
>> I favor the approach of providing validation tests as a set of
>> SPARQL queries against the RDF data:
>> 
>> - The simplest form would be to use an ASK query, which returns a
>> true/false value, to indicate whether the test passed or failed.
>> ASK is good for verifying the presence of expected data.
>> 
>> - For constraint checking, a better form is to use a CONSTRUCT
>> query, using the SPIN constraint checking style:
>> http://spinrdf.org/spin.html#spin-constraint-construct
>> CONSTRUCT is better for this because it can return information about
>> the reason why the test failed, which is very helpful for debugging
>> purposes.  If the CONSTRUCT query returns nothing
> 
> You can persuade SELECTs to do the same
> <http://www.w3.org/2012/12/rdf-val/SOTA#sparql>
> but it's a real pain to be thorough. On the bright side, the results
> <http://www.w3.org/2012/12/rdf-val/SOTA#sparqlValidRes>
> are quite tabular and easy to read when validating lots of resources.
> 
> I'd like to include SPIN examples in the document above if you want
> submit some.
> 
> 
>> There are big benefits in using RDF and SPARQL for this purpose:
>> 
>> - The tests are resilient to the presence of extra information.
>> This means that additional data, vocabularies and ontologies can be
>> mixed in, without affecting existing information access or tests.
>> 
>> - All tests are written in the same, common language, regardless of
>> the underlying data model that they test.  This makes it very easy
>> to share and deploy new tests.
>> 
>> - Different constraints can be defined for different purposes, and
>> kept separate from the data.  It is helpful to break validation into
>> two kinds, depending on one's role as data producer or data
>> consumer. Quoting from "RDF and SOA", these two kinds of validation
>> are:
>> http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm#data-validation
>> it :
>> [[
>> - Model integrity (defined by the producer).  This is to ensure
>> that the instance makes sense: that it conforms to the producer's
>> intent, which in part may be constrained by contractual obligations
>> to consumers.  Since a data producer is responsible for generating
>> the data it sends, it should supply a way to check model integrity.
>> This validator may be useful to both producers and consumers.
>> However, because the model may change over time (as it is
>> versioned), the consumer must be sure to use the correct model
>> integrity validator for the instance data at hand -- not a validator
>> intended for some other version -- which means that the instance
>> data should indicate the model-integrity validator under which it
>> was created.
>> 
>> - Suitability for use (defined by the consumer).  This depends on
>> the consuming application, so it will differ between producer and
>> consumer and between different consumers.  Since only the data
>> consumer really knows how it will use the data it receives, it
>> should supply a way to check suitability for use.  This may also
>> include integrity checks that are essential to this consumer, but to
>> avoid unnecessary coupling it should avoid any other checks.
>> ]]
>> 
>> Thus, different suitability-for-use checks can be defined by
>> different data consumers.
>> 
>> To my mind, this SPARQL-based approach is much more flexible than an
>> OWL-like approach.
> 
> I believe that it's easier to ensure completenes if you have a
> declarative description like DC's Application Profile or IBM Resource
> Shapes.
> <http://www.w3.org/2012/12/rdf-val/SOTA#shapes>
> A tool can enforce the rules by writing SPARQL or SPIN rules. you can
> avoid a lot of opportunities for mistakes if you find the right
> expressivity for the constraints language.
> 
> 
>> David Booth
>> 
>> -
> 
> -- 
> -ericP
> 

Received on Monday, 8 April 2013 20:53:06 UTC