Modelling and inferencing for controlled terms

Hi team,

I want to explain a problem I have encountered while implementing
inferencing in Longwell, and outline a number of possible solutions and
their pros and cons. This issue is relevant to Haystack also. 

The Artstor data has a number of properties which contain hierarchical terms
e.g. 

Architecture : Artist
Architecture : Site

A while ago I modified the XSLT transform that creates the RDF version of
the Artstor data to insert subclass relations to indicate the hierarchy. In
the following examples I've simplified the names here to make it easier to
read, and numbered them to make them easier to refer to later on:

(1)

artstorcontrolledterm:Architecture
	rdfs:label		"Architecture" ;
	rdfs:type		ArtstorControlledTerm .

(2)

artstorcontrolledterm:Architecture_artist
	rdfs:subClassOf	artstorconrolledTerm:architecture ;
	rdfs:lable		"Architecture : Artist" ;
	rdfs:type		ArstorControlledTerm .

then these terms are used in the inference data as follows:

(3)

artstordata:UCSD001
	artstor:subject	artstorcontrolledTerm:Architecture_artists .

So I want the inference engine to infer result (4):

(4)

artstordata:UCSD001
	artstor:subject	artstorcontrolledTerm:Architecture

However according to RDFS, I can only make this inference if the property is
rdf:type e.g. 

a rdf:type b .

b rdfs:subClassOf c .

then I could infer

a rdf:type c .

So I can't make that inference in this case.

POSSIBLE SOLUTIONS

1. Create a custom rule (which eventually could be described in some kind of
rules language) that infers result (4).

Advantages: No changes to current data model.

Disadvantage: This requires custom rule processors, however one of the aims
of the semantic web is to avoid these custom processors because this means
the data can only be processed by these processors. Of course eventually
these custom rule processors could be replaced by rules written in a
standardised rule language, but we don't have such a language yet. 


2. Change the data model to include an instance that is of type
artstorcontrolledTerm:architecture_site e.g. assuming (1) and (2), 

artstordata:UCSD001
	artstor:subject [ rdf:type
artstorcontrolledTerm:Architecture_artists ] .

then we can now infer

artstordata:UCSD001
	artstor:subject [ rdf:type artstorcontrolledTerm:Architecture ] .

Advantages: This will work with standard RDFS inference, so no need for
custom rule processors.

Disadvantages: This makes the data model more complicated. Secondly it
increases the complexity of the inference task - for example we now have to
inference over every bNode of type arstorcontrolledTerm:architecture_artists
rather than just a single instance of the
artstorcontrolledTerm:Architecture_artists class as in solution (1). Thirdly
it will require changes to both the Haystack and Longwell clients. 


3. I'm not so clear on this, but I think we could use owl:hasValue e.g.

artstordata:UCSD001
	artstor:subject	artstorcontrolledTerm:Architecture_artists .

artstorcontrolledTermClass:Architecture_artists 
	rdf:type	owl:Class ;
	rdfs:subClassOf	artstorcontrolledTermClass:Architecture
	rdfs:subClassOf	[
		rdf:type owl:Restriction ;
		owl:onProperty artstor:subject ;
		owl:hasValue artstorcontrolledTerm:Architecture_artists ; ]
.

which allows us to infer

artstordata:UCSD001
	rdf:type	artstorcontrolledTermClass:Architecture_artists ;
	rdf:type	arcstorcontrolledTermClass:Architecture

Advantages: This works with standard OWL inference, so no need for custom
rule processors.

Disadvantages: This adds complexity to the schema / ontology, as every
controlled term is now represented by two URIs, one the controlled term in
the property, the other representing the type. As I see it, this is related
to a standard modelling question in RDF e.g. when should do we do this

a
	b	c ;
	d	e ;
	f	g ;

and when should we do this:

a
	rdf:type	b_c ;
	rdf:type 	d_e ;
	rdf:type	f_g ;

MY RECOMMENDATION:

As I am having to use a custom inference engine at the moment anyway, I
think solution (1) is the easiest. The proposed RDF standard for thesauri
http://www.w3c.rl.ac.uk/SWAD/rdfthes.html
also seems to be predicated on the existence of custom processors rather
than leveraging OWL and RDFS. At some point, we can migrate our vocabularies
to use this standard, and then programs that use this standard can also use
our (well the Artstor) vocabularies. 

Can anybody else think of any other alternatives here, and do people agree
that solution (1) is the way forward or do people have a strong preference
for one of the other alternatives and if so why?

Mark Butler
Research Scientist 
HP Labs Bristol
http://www-uk.hpl.hp.com/people/marbut 

Received on Thursday, 5 February 2004 09:25:08 UTC