RE: [OPEN] and/or [PORT] : a practical question $swpbd from Uschold, Michael F on 2004-03-29 (public-swbp-wg@w3.org from March 2004)

From: Uschold, Michael F <michael.f.uschold@boeing.com>
Date: Sun, 28 Mar 2004 23:27:36 -0800
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "Bernard Vatant" <bernard.vatant@mondeca.com>
Cc: "SWBPD" <public-swbp-wg@w3.org>, "Ian Horrocks" <horrocks@cs.man.ac.uk>
Message-ID: <823043AB1B52784D97754D186877B6CF04894DA0@xch-nw-12.nw.nos.boeing.com>
See inserted comments with "MFU" 

 -----Original Message-----
From: 	public-swbp-wg-request@w3.org [mailto:public-swbp-wg-request@w3.org]  On Behalf Of Jeremy Carroll
Sent:	Wednesday, March 24, 2004 1:24 AM
To:	Bernard Vatant
Cc:	SWBPD; Ian Horrocks
Subject:	Re: [OPEN] and/or [PORT] : a practical question


Yes, like Bernard, I have been thinking more about this, and about Ian's 
insistence in WebOnt that classes-and-instances was almost always raised by 
people wanting to mismodel their world. (cc Ian, wondering if I have learnt 
my lessons well!, or misrepresented him)


The class hierarchy in RDFS/OWL is there to describe hierarchies of classes 
of resources. Just because you have a hierarchy of subject descriptors 
doesn't make it a class hierarchy.

MFU> Well actually, YES it DOES (or at least, it makes perfect sense for it to)! However, the classes are different beasts. The class corresponding to 'subject-foo ' denotes the set of all things whose subject is 'subject-foo '.   For example, if the term 'PhdThesis' is a subject descriptor, then the class corresonding to this denotes the set of all things whose subject is 'PHDThesis', which is a very different class than the set of all Phd Theses.  The latter is the is a more common interpretation of a class named "PHDTHesis". These are two very different things. The question to ask is whether you need to explicitly model both kind of classes, and if you do, do you need to explicitly model the connection between the two. I believe that this is the heart of the matter, more than the cncern about whether classes can be instances. Of course, when you try to model it, you get into considering classes as instances may make sense, just as it makes sense to model the class of cats as an instance of 'species'. I have not thought it through, but I think that the species example is not a good analogy for what is going on with subjects, which is far for complicated, as described in Welty papers from a few years ago.

It seems to be confusing the human way of thinking of analogy and metaphor 
(any hierarchy can act as a metaphor for any other hierarchy) with what is 
a logical and implementation issue about how to say what we want to say 
about our knowledge of our world in a way that machines can process it.

Thus if PhDThesis is an owl:Class what are the resources that we intend to 
belong to it? Probably my PhD Thesis with title "Graph Grammars: an 
approach to transfer based MT; exemplified by a Turkish-English system" is 
one such resource, but the copy sitting on my bookshelf is probably not.

MFU> this brings up another quite different distinction... instances of a work, vs. the work itself.

Then if that is the case what would we mean by dc:subject linking the 
resource of my thesis with this class .... hmmm ... we mean my thisis 
belongs to that class, i.e. rdf:type.
So if we want to treat this subject hierachy as classes we really also want

dc:creator rdf:subPropertyOf rdf:type .

or perhaps

eg:creator rdf:subPropertyOf rdf:type .
eg:creator rdf:subPropertyOf dc:creator .

But if we click on dc:creator we get to:
http://purl.org/dc/elements/1.1/subject

<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/subject">
<rdfs:label xml:lang="en-US">Subject and Keywords</rdfs:label>
<rdfs:comment xml:lang="en-US">The topic of the content of the 
resource.</rdfs:comment>
<dc:description xml:lang="en-US">
Typically, a Subject will be expressed as keywords,
key phrases or classification codes that describe a topic
of the resource.  Recommended best practice is to select
a value from a controlled vocabulary or formal
classification scheme.</dc:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/elements/1.1/"/>
<dcterms:issued>1999-07-02</dcterms:issued>
<dcterms:modified>2002-10-04</dcterms:modified>
<dc:type 
rdf:resource="http://dublincore.org/usage/documents/principles/#element"/>
<dcterms:hasVersion 
rdf:resource="http://dublincore.org/usage/terms/history/#subject-004"/>
</rdf:Property>

and we see that dc:subject should typically be a string from a controlled 
vocabulary. Thus it seems particularly poor practice to deviate from the 
preferred usage of dc:subject in order to (over-)simplify our model.

This points to the solution I was earlier advocating of using such strings, 
using hasValue restrictions to map the strings into classes and then using 
the class hierachy on those restrictions to show the hierarchical 
relationships between the subject vocab terms. To do this well, we probably 
want to specialise the dc:subject property with a subproperty eg:subject, 
specify its range with an owl:Datarange explicitly enumerating the 
controlled vocabulary, and for each term create a class using a hasValue 
restriction.
For further clarity and usablility we might want to create two related 
properties, one indicating the (single) intended subject code, and the 
other indicating all implicit subject codes formed from the class hierachy.
The former would be a subproperty of both the latter and dc:subject; the 
latter would be used to create the hasValue restrictions.

Hmmm ... quite a lot of work initially, but the end result is that the 
subject indicators are marked up using text strings from an explicit 
controlled vocab; we conform with the defn of dc:subject, even with the 
advertised best practice; we fall within OWL DL with the expectation that 
this will give us better reasoning performance, and we have been clearer 
about we are trying to say. I think the complexity can be hidden from the 
end users.

MFU> Unfortuntately, I did not follow every detail, a short worked example would help.  It would help even more if examples use a human-readable and more concise syntax such as the OWL abstract syntax. Only computers and people who read/write parsers need ever see raw OWL syntax.

Jeremy









Bernard Vatant wrote:
> 
> *BV
> 
>>>- Is it worth the trade-off to switch one's ontology (otherwise DL)
>>>to OWL-Full, just to
>>>allow its classes to be used as objects in 'dc:subject' predicates?
> 
> 
> *Jim
> 
>>That's a weird way to ask the question.  You mean, is it worth doing
>>the extra work to break your naturally occuring model just so that
>>you can be in DL?
> 
> 
> The way I put it might seem weird indeed, but it's the way it was set in the real project
> context (real world is weird). We had an OWL-DL ontology, and wanted to keep it so, and
> suddenly after six months or so some user wants to be able to use a class as a subject of
> a document ... which is one case out of one thousand, the 999 others using 'regular'
> subjects. So using a class as subject of a document is not exactly 'naturally occuring'.
> It's a borderline case - not to say a weird one :))
> 
> *Jim
> 
>>I would argue this is indeed a BP issue, but probably for WORLD not
>>for OPEN... we need to explain why and when you would do the extra
>>work (and in every case we have explored it is extra work) to make
>>sure your ontology is in the DL profile of OWL.
> 
> 
> I suggested it might be in PORT scope, because it deals with the terminology vs ontology
> general issue. For me the heart of the question is to know what it means to 'use a
> concept' defined in a terminology (glossary, thesaurus, subject headings, index...) as a
> class (or a property) in an ontology.
> 
> Is 'PhD Thesis' class the same 'subject' (using TM language here, sorry) or 'resource'
> than the original concept? The more I think about it, the more I have to deal with it, and
> the more I tend to say that they are distinct animals. Jim's PhD Thesis is an instance of
> the class, but not of the concept. One subject of 'Social Functions of PhD Thesis in
> Occidental University during 20th century', is the concept of PhD Thesis, not the class.
> 
> So it's not just an issue of OWL-DL vs OWL-Full, it's also an issue of making distinct or
> not those two 'things'. This is a core issue in porting thesaurus to the SW, related to
> others of the same kind, like if concepts A and B are interpreted as classes, and there is
> a Broader-Narrower relationship between A and B in the Thesaurus, has it to be interpreted
> as a class-subclass relationship in the ontology etc.
> 
> So I think in that case a BP definition would be two-fold
> 
> 1. Is it generally a BP to make terminology concepts distinct from ontology classes (and
> properties)?
> 2. If agnostic about 1, what is the trade-off when choosing to make them distinct or to
> merge them ?
> 
> FWIW, having tried both terms of the alternative in the course of time, my personal view,
> for above quoted reasons, is that they shoud be kept separate, and it's worth the extra
> work (even before being aware of the DL vs Full issue)
> 
> Are there other concrete experience on that, not only theoretical considerations? Seems
> like there are not so many people exploring the terminology-ontology interoperability. Or
> are they?
> 
> Bernard Vatant
> Senior Consultant
> Knowledge Engineering
> Mondeca - www.mondeca.com
> bernard.vatant@mondeca.com
> 
>
Received on Monday, 29 March 2004 02:35:07 UTC