Re: RDF vocabulary scope guidelines from Xiaoshu Wang on 2009-02-09 (semantic-web@w3.org from February 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Mon, 09 Feb 2009 22:16:37 +0000
To: Harry Halpin <hhalpin@ibiblio.org>
CC: "semantic-web@w3.org" <semantic-web@w3.org>, rnewman@twinql.com, ojirio@gmail.com
Message-ID: <4990AB45.9080102@musc.edu>
Harry Halpin wrote:
> Richard Newman wrote:
>   
>> Hi Jiri,
>>
>> As the author of that ontology, I am in the unique position of being
>> able to explain my modelling choices!
>>
>> I took that approach for two reasons:
>>
>> 1: precision. By creating my own term, I can define precisely what is
>> meant by (for example) "creation" --  is it the moment I choose to add
>> a tag, or the time that tag reached some server? Another way of
>> phrasing this is that coining a new property or class allows for
>> "minimal enforced ambiguity".
>>     
> In favor of the other opinion, I might add, creating a new term when a
> well-known existing term exists creates semantic islands rather than
> linked data IMHO.  Also, by not re-using URIs, you lose the ability to
> do URI-directed graph merges, which is the real advantage of RDF over
> XML or JSON based formats. Otherwise, to be honest, I'd rather work in
> JSON or my favorite programming language than RDF.
>   
I agree with Harry.  In fact, the better principle should be the 
"minimal ontological commitment", which is described in Gruber's article 
[1] and I have also discuss it in [2].And I think "minimal enforced 
ambiguity" contradicts with that principle.

1. Gruber, T.: A translation approach to portable ontology 
specifications. Knowledge
Acquisition 5 (1993) 199-220.
2. Wang X et.al: Ontology Design Principles and Normalization Techniques 
in the Web. Lecture Notes in Bioinformatics; (DILS08 Evry, France) Vol. 
5109, 28-43. (A PDF copy of it can be found at 
http://www.inesc-id.pt/ficheiros/publicacoes/4799.pdf)
> Now, it's possible sub-class/sub-propertying can somehow get you to do a
> one-two-step to infer and then merge. In theory, sounds great. In
> practice, I've rarely seen it done.
>
> The *real* problem is, short of using Sindice and poking around, it's
> virtually impossible to find URIs for "similar" things on the Semantic
> Web, and so people have no idea if they are duplicating URIs or not. So,
> I can't blame everyone for creating new URIs. However, I would recommend
> at least looking for a well-deployed URIs before creating your own.
> DBPedia, SKOS, Dublin Core or FOAF come to mind as the virtually only
> real largely deployed RDF vocabularies at least for general-purpose
> subject matter, although poking around Sindice can't hurt.
>   
>> 2: a related point: by deliberately using a new term, it can be
>> specifically and accurately related to other terms in other ontologies
>> -- e.g., my taggedOn might be an equivalentProperty to John Smith's
>> tagdate, and a subproperty of a generic date property. Under
>> inference, all desired knowledge is apparent, without being corralled
>> into a not-quite-compatible ontological framework.
>>     
> See point about inference above :) Sounds good in theory, rarely seen
> done usefully, although it features highly in academic papers done by
> ontologists often. Anyone got any non-academic cases where this has
> actually been done with RDF(S) or OWL?
>   
>> The expense of reasoning is a slight discouragement to this approach,
>> but I think in general it stands up.
>>     
I do not think that the expense of reasoning is a *slight* 
discouragement.  It is slight only if the ontology is used for a 
specific task that its designer has enforced and don't intended to be 
shared (at least not in a wide scope).  Because if a user intends to 
share multiple ontologies, and each of which has a *slight* 
discouragement, the end results won't be *slight* any more because the 
inference complexity is not linearly related to the number of 
assertions.  In addition, it also increases the likelihood that the 
ontology merging will lead to contradiction, which makes the sharing 
impossible.

This is not to say that developing ontologies with more rigorous clarity 
is bad.  It is good as long as it serves its purpose.  Rather, the issue 
is about carefully modulizing ontologies according to its semantic 
granularity and domain orthogonality.  It is about to remove and manage 
the semantic dependency of URI.  This is no different from what we do in 
software engineer.  The more dependency of your code (assertions), the 
less it can be shared and extended.

Thus, if you intend to develop an ontology and don't care if it will be 
shared by a user in some other contexts, then your "minimal enforced 
ambiguity" is good. Otherwise, the principle of "minimal ontological 
commitment" should be followed. 

Xiaoshu 
>> HTH,
>> -Richard
>>
>>
>> -- 
>> Sent from my iPhone.
>>
>> On Feb 6, 2009, at 15:24, Jiri Prochazka <ojirio@gmail.com> wrote:
>>
>>     
>>> Hi,
>>> I am sure I am not the first one to notice, but I think there is a
>>> problem with determining scope when designing a RDF vocabulary. Reuse of
>>> well designed, loosely coupled, high cohesion, more general vocabularies
>>> versus domain specific vocabularies.
>>>
>>> Typical example is date of creation. I am writing this largely thanks to
>>> this vocabulary: http://www.holygoat.co.uk/projects/tags/
>>> It defines class Tagging, which uses properties taggedBy and taggedOn.
>>> This is the domain specific approach. The example is:
>>>  <http://example.com/blog/post/1> :tag
>>>    [ a :Tagging ;
>>>      :associatedTag tag:blog, tag:chimpanzee ;
>>>      :taggedBy <http://example.com/People/Jim> ;
>>>      :taggedOn "2005-03-29T15:24:10Z"^^xsd:date ] .
>>>    tag:blog :tagName "blog" .
>>>    tag:chimpanzee :tagName "chimpanzee" .
>>>
>>> But as another alternative I imagine:
>>>  { <http://example.com/blog/post/1> :tag tag:blog, tag:chimpanzee . }
>>>    time_vocab:createdOn "2005-03-29T15:24:10Z"^^xsd:date ;
>>>    author_vocab:author <http://example.com/People/Jim> .
>>>  tag:blog :tagName "blog" .
>>>  tag:chimpanzee :tagName "chimpanzee" .
>>>
>>> Where time_vocab and author_vocab talk about RDF resources (graphs in
>>> fact) and could be defined in just one RDF resource description
>>> vocabulary instead of two.
>>> Or another alternative in which time_vocab:createdOn and
>>> author_vocab:author have domain rdfs:Class:
>>>  <http://example.com/blog/post/1> :tag tag:blog, tag:chimpanzee ;
>>>    time_vocab:createdOn "2005-03-29T15:24:10Z"^^xsd:date ;
>>>    author_vocab:author <http://example.com/People/Jim> .
>>>  tag:blog :tagName "blog" .
>>>  tag:chimpanzee :tagName "chimpanzee" .
>>>
>>> Which of this approaches is recommended and why?
>>>
>>> I tend to agree more with the more general vocabulary approach. Like you
>>> should ask yourself when designing RDF properties "Shouldn't the
>>> domain/range of it be some parent class? If yes, does the property fit
>>> the scope of this vocabulary? Shouldn't it be in some more general
>>> one?", focusing on reuse rather than rely on later linking of
>>> vocabularies.
>>>
>>> If there were any past discussions on this topic, what were the results
>>> of it?
>>> Is there any vocabulary for rating resources in terms of authenticity
>>> (trust) and agreement (truthfulness)? Vocabulary(ies) covering other
>>> resource description aspects would be helpful too... (POWDER is so
>>> cumbersome)
>>>
>>> Best regards,
>>> Jiri
>>>
>>>       
>
>
>
Received on Monday, 9 February 2009 22:17:40 UTC