Re: SKOS, controlled vocab, and open world assumption from Hugh Glaser on 2013-04-16 (semantic-web@w3.org from April 2013)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Tue, 16 Apr 2013 18:20:47 +0000
To: Jeremy J Carroll <jjc@syapse.com>
CC: "Eric Prud'hommeaux" <eric@w3.org>, David Booth <david@dbooth.org>, "Dave Reynolds" <dave.e.reynolds@gmail.com>, "<semantic-web@w3.org>" <semantic-web@w3.org>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA74EB80660@UOS-MSG00041-SI.soton.ac.uk>
Here is my echo of what you say, so that we can see if I understand you correctly:

There are two main bases here:
A) The controlled vocabulary is on base DA;
B) The application provider is on base DB.
So we have DA/disease1, DA/disease2,…
The app uses DA URIs.
By using the app, a user defines a new disease, DB/diseaseA
Because of this (although that actually doesn't matter), we get a new DA/disease42

Now, the deal:
"Someone" decides that DB/diseaseA owl:sameAs DA/disease42;
and would like it if DB/diseaseA became the accepted URI, and that DB/diseaseA slowly (or in fact quickly) dies off.

The two "someone"s in this world are the owner of the controlled vocabulary and the owner of the app.
Assuming that the technology works OK, they can each decide for themselves whether they put the sameAs triple into their stores.
Of course, if the vocab-provider agrees to publish the triple, then all that has to happen is that the app-provider reloads the vocab.

I am guessing this does most of what you want.
But you also want the DB/diseaseA to die off.

This is how we do it in my world, although your mileage may vary (it may not apply in yours!)
Providers who are publishing this sort of thing keep an associated identity management KB (which we call a sameAs store).
When they find these sameAses, they put them in the sameAs store for the associated KB.
Anyone else can choose to also put their triple in their sameAs store, of course.
(This is good, because this knowledge is very different in character, licence, provenance, etc from what they are actually trying to publish.)
So let's say that the vocab-provider puts the triple in their sameAs store.
If they actually want the old DB/diseaseA to die off, then they "deprecate" the DB/diseaseA URI;
this is a facility of the sameAs store that means that the URI can be used for look-up, but will not be published as a response to DA/disease42.
Now the app-provider can still query with DB/diseaseA, but will find that it has been deprecated in favour of DA/disease42.
At their convenience they can move to using DA/disease42 in their app.

There are other scenarios with the app-provider's sameAs store, but you get the picture.

This is the way that the knowledge of sameAs-ness can spread in an anarchic fashion, without the need for synchronisation and authority (Web-like).

Best
Hugh

On 8 Apr 2013, at 21:52, Jeremy J Carroll <jjc@syapse.com> wrote:

> 
> Thanks for the input
> 
> my actual use case is with bio vocabularies, where for instance:
> 
> + we install a copy of a current version of a web-based vocabulary of diseases say, into our application
> + a customer using their copy, comes across another disease and has to add it in
> + later the web-based vocab is updated with the new disease, we update our copy, and the customer application, and we want some sort of managed process by which the customer copy of the disease term gets dropped in favor of the new standard way of referring to the same disease
> 
> While I suspect one can do the same with ice cream flavors, it is perhaps more real to ground in biomedical terminology.
> 
> 
> 
> Jeremy J Carroll
> Principal Architect
> Syapse, Inc.
> 
> 
> 
> On Apr 6, 2013, at 4:41 PM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
>> * David Booth <david@dbooth.org> [2013-04-06 17:18-0400]
>>> On 04/06/2013 01:21 PM, Eric Prud'hommeaux wrote:
>>>> What
>>>> we'd like for "validation" is for JJC to label his notion of ice cream
>>>> flavors and someone else to extend it in a way that a 3rd party can
>>>> can accept amanda:Chocolate but reject jjc:Choco999late. Any candidates
>>>> or starting points?
>>> 
>>> I favor the approach of providing validation tests as a set of
>>> SPARQL queries against the RDF data:
>>> 
>>> - The simplest form would be to use an ASK query, which returns a
>>> true/false value, to indicate whether the test passed or failed.
>>> ASK is good for verifying the presence of expected data.
>>> 
>>> - For constraint checking, a better form is to use a CONSTRUCT
>>> query, using the SPIN constraint checking style:
>>> http://spinrdf.org/spin.html#spin-constraint-construct
>>> CONSTRUCT is better for this because it can return information about
>>> the reason why the test failed, which is very helpful for debugging
>>> purposes.  If the CONSTRUCT query returns nothing
>> 
>> You can persuade SELECTs to do the same
>> <http://www.w3.org/2012/12/rdf-val/SOTA#sparql>
>> but it's a real pain to be thorough. On the bright side, the results
>> <http://www.w3.org/2012/12/rdf-val/SOTA#sparqlValidRes>
>> are quite tabular and easy to read when validating lots of resources.
>> 
>> I'd like to include SPIN examples in the document above if you want
>> submit some.
>> 
>> 
>>> There are big benefits in using RDF and SPARQL for this purpose:
>>> 
>>> - The tests are resilient to the presence of extra information.
>>> This means that additional data, vocabularies and ontologies can be
>>> mixed in, without affecting existing information access or tests.
>>> 
>>> - All tests are written in the same, common language, regardless of
>>> the underlying data model that they test.  This makes it very easy
>>> to share and deploy new tests.
>>> 
>>> - Different constraints can be defined for different purposes, and
>>> kept separate from the data.  It is helpful to break validation into
>>> two kinds, depending on one's role as data producer or data
>>> consumer. Quoting from "RDF and SOA", these two kinds of validation
>>> are:
>>> http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm#data-validation
>>> it :
>>> [[
>>> - Model integrity (defined by the producer).  This is to ensure
>>> that the instance makes sense: that it conforms to the producer's
>>> intent, which in part may be constrained by contractual obligations
>>> to consumers.  Since a data producer is responsible for generating
>>> the data it sends, it should supply a way to check model integrity.
>>> This validator may be useful to both producers and consumers.
>>> However, because the model may change over time (as it is
>>> versioned), the consumer must be sure to use the correct model
>>> integrity validator for the instance data at hand -- not a validator
>>> intended for some other version -- which means that the instance
>>> data should indicate the model-integrity validator under which it
>>> was created.
>>> 
>>> - Suitability for use (defined by the consumer).  This depends on
>>> the consuming application, so it will differ between producer and
>>> consumer and between different consumers.  Since only the data
>>> consumer really knows how it will use the data it receives, it
>>> should supply a way to check suitability for use.  This may also
>>> include integrity checks that are essential to this consumer, but to
>>> avoid unnecessary coupling it should avoid any other checks.
>>> ]]
>>> 
>>> Thus, different suitability-for-use checks can be defined by
>>> different data consumers.
>>> 
>>> To my mind, this SPARQL-based approach is much more flexible than an
>>> OWL-like approach.
>> 
>> I believe that it's easier to ensure completenes if you have a
>> declarative description like DC's Application Profile or IBM Resource
>> Shapes.
>> <http://www.w3.org/2012/12/rdf-val/SOTA#shapes>
>> A tool can enforce the rules by writing SPARQL or SPIN rules. you can
>> avoid a lot of opportunities for mistakes if you find the right
>> expressivity for the constraints language.
>> 
>> 
>>> David Booth
>>> 
>>> -
>> 
>> -- 
>> -ericP
>> 
> 
>
Received on Tuesday, 16 April 2013 18:22:35 UTC