RE: [OEP] [SE] Logical vs Indexing - multiple Types

Alan

I see we are basically on agreement. OK for the distinction between asserted vs inferred
types. In my mind they were both under "logical types". OK also for the notion of
"secondary reasoning" based on classification schemes. I had exactly the same kind of
remark from my boss a few hours ago :)

As for the vocabulary ... indexing or classification? I've used both, and I'm agnostic on
the word itself. French people in the library world don't like to use in French
"classification" in this context and prefer "indexation" (which does not exist in English,
right?). In any case, we have to communicate correctly with the library community,
whatever its natural language :))

Best

Bernard

-----Message d'origine-----
De : Alan Rector [mailto:rector@cs.man.ac.uk]
Envoyé : mardi 8 février 2005 14:48
À : Bernard Vatant
Cc : SWBPD
Objet : Re: [OEP] [SE] Logical vs Indexing - multiple Types


Bernard
This is an area where we have both experience and strong views.
I think you actually need to distinguish three different sorts of Types
-    Asserted types.  We would agree that any primitive should have only one asserted
type. See Modularisation of domain ontologies written in description logics and OWL
-    Inferred types.  One of the major reasons for using a classifier is to manage complex
multi-hierarchies of
    defined types.  These can be used for indexing but may also be used for other
purposes.
-    Indexing classifications/Types.  For us things such as the Medical Subject Headings
(MeSH )from the library world but also things like the International Classification of
Diseases, the Clinical Procedure Terminology, etc. These typically are constructed around
"broader than"/"narrower than" lines and often have various idiosyncratic features, e.g.
in MeSH the same string is found at the end of numerous paths, so the path is not an
identifier whereas in ICD the identifier is the path.   For classification types we would
advocate indirect mapping rather than direct modelling - ie providing pointers to/from the
ontology via annotation properties -  a) because the internal structure of the
classification type hierarchy typically follows different principles that, if imported
into the ontology itself, cause confusion at best and contradictions at worst; and b)
because it provides a hook for secondary reasoning.  Using the inferred types as a
framework for indexing classifications works very well.  For this reason, I am not
entirely happy with the phrase "indexing types".  I would prefer "classification types"
but I don't know how that fits in the library world.
Regards
Alan

Bernard Vatant wrote:
This is the follow-up of a debate which started last week on Protégé List[1]
To sum it up, the starting point was demand from Protégé users to have ontology editors
handle correctly multiple "rdf:type" declarations for the same instance, IOW :
- Allow declaration of multiple types through the GUI, and further editing of such
instances
- Handle correctly multiple rdf:type declarations in imported RDF files
So far, Protégé was allowing to import OWL files with multiple rdf:type declarations, but
could neither edit them, nor create them through the GUI. Dealing with an instance of
multiple classes in a GUI dynamically constructed from classes properties is not obvious
(we have the same issue in Mondeca ITM).
My first reaction was that it's certainly *not* a very good idea to push people to create
multiple rdf:type for the same owl:Individual directly through ontology editing. Beyond
technical difficulties of implementation, using multiple types, at least in a single
source ontology, seems often to proceed from modeling confusion between "logical types"
using formal declaration of named classes and various restrictions, and what I
temptatively call "indexing types" which are defined by any property value (and could be
explicited using "hasValue" restrictions). Logical types are intended for expression of
logical structures, constraints, support processing and inference, software configuration,
etc ... whereas indexing types are intended for display, navigation, and search. In other
words, logical types represent the AI view of typing, whereas indexing types represent the
librarian/documentalist view. Indexing types can be organized through various "schemes",
as is currently discussed in the SKOS framework, and hierarchies that do not express
logical subsumption, but various flavors of broader-narrower used by Thesauri, so-called
Taxonomies and the like ...
Both logical and indexing types are needed in most information architectures, but since
there is no clear-cut way to use indexing types in RDF-OWL, nor any clear reference to
them in the specs (because, IMO the authors of the specification came more from the AI
world than from the librarian one), there is a trend in modeling practice to handle them
using a lot of unnecessary logical types (the extreme case being total and thoughtless
refactoring of thesauri into ontologies), leading to both crammed ontologies, and frequent
need for multiple types. My thesis is [2] that the need for multiple types is more for
indexing types than for logical ones, and therefore good practice should lead to very few
(if any) cases of multiple logical types (at least in single-source ontologies, the
question of multiple types appearing when merging ontologies being of course a tricky one
which cannot be avoided).
So I figure this group could produce some reflexions and maybe recommendations, from both
modeling (OEP) and software engineering (SE) viewpoints, on the following points :
- Logical types vs indexing types : Why, When, How to use either one.
- Ways to practically use indexing types for display, navigation, sorting and query.
- Use and abuse of multiple types
- Best practices in software engineering to deal with indexing types, and multiple types
I've started to tweak those questions together in a short paper, will publish it as soon
as I can turn them into something consistent and readable, and volunteer to turn them into
a proper draft note if there is any interest in it from either OEP or SE, or both (even if
I am not formally member of any of those TF so far).
Thanks for your interest
Bernard
[1] http://comments.gmane.org/gmane.comp.misc.ontology.protege.owl/8859
[2] Mike have asked on Protégé list if I had references to any literature on those issues
to support my thesis, and I'm afraid I've not found proper sources so far. Keep searching,
any pointers welcome (pro or con).
**********************************************************************************
Bernard Vatant
Senior Consultant
Knowledge Engineering
bernard.vatant@mondeca.com
"Making Sense of Content" :  http://www.mondeca.com
"Everything is a Subject" :  http://universimmedia.blogspot.com
**********************************************************************************
--
Alan L Rector
Professor of Medical Informatics
Department of Computer Science
University of Manchester
Manchester M13 9PL, UK
TEL: +44-161-275-6188/6149/7183
FAX: +44-161-275-6236/6204
Room: 2.88a, Kilburn Building
email: rector@cs.man.ac.uk
web: www.cs.man.ac.uk/mig
        www.opengalen.org
        www.clinical-escience.org
        www.co-ode.org

Received on Tuesday, 8 February 2005 15:35:05 UTC