- From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
- Date: Fri, 19 Sep 2003 18:01:50 +0100
- To: "'public-esw-thes@w3.org'" <public-esw-thes@w3.org>
- Message-ID: <350DC7048372D31197F200902773DF4C02AA0C4F@exchange11.rl.ac.uk>
This is a document with some suggestions, intended to get the ball rolling for a discussion on how to fit RDF representations of various flavours of thesaurus into the world of ontologies, classification schemes etc. <<CIFoutline.rtf>> Suggestions for modular vocabularies for representing thesaurus-like data. The Problem A thesaurus contains two quite different kinds of information. First, information about how different terms are used to indicate things in our heads. In other words, information about the labelling of concepts. I will call this terminology information. Second, information about how the things in our head are related to each other. In other words, o model of some conceptual structure. I will call this conceptual information. A major criticism of existing thesaurus standards is that these two type sof information are muddlied, and hence recommendations are confusing (see NKOS 2003 presentation by Dagobert Soergel). The Solution We model these two types of informaiton independently. This greatly improves clarity. It would also allow us to attach terminology information to all different kinds of conceptual structures (indexing schemes, classification schemes, ontologies, topic maps etc.) In RDF, I suggest we create several modular vocabularies. For example, we could have the following modules: Terminology Module Language Module Conceptual Indexing Module Terminology Module A simple set of properties allowing you to attach a preferred label and set of alternate labels to any resource. #RDF Terminology Module # ############################################################################ ########### @prefix term: <???> . term:pref a rdf:Property; rdfs:domain rdf:Resource; rdfs:range term:Term. term:alt a rdf:Property; rdfs:domain rdf:Resource; rdfs:range term:Term. term:Term a rdfs:Class. term:value a rdf:Property; rdfs:domain term:Term; rdfs:range rdfs:Literal. ############################################################################ ########### So if your conceptual structure is best modelled as an ontology for example (lots of explicit isa, instanceof, partof and other specific type relations) then build an OWL ontology and add the terminology layer using the above properties. If you want a traditional style thesaurus, use the Conceptual Indexing Module (see below) and attach terminology layer. #Some example use of terminology module ############################################################################ ########### @prefix example: <example>. example:aResource term:pref [ a term:Term; term:value "Cats"; ]; term:alt [ a term:Term; term:value "Felines"; ]; ############################################################################ ########### Language Module I also suggest a generic module for expressing that any resource is in a specific language (maybe this already exists ???). #RDF Language Module # ############################################################################ ########### @prefix lang: <???>. lang:lang a rdf:Property; rdfs:domain rdf:Resource; rdfs:range lang:Language. lang:Language a rdfs:Class. lang:en a lang:Language. lang:fr a lang:Language. #.....etc. ############################################################################ ########### This would give us a standard way to attach multilingual labels to an OWL ontology, for example, or any conceptual structure which is sufficiently independent of a linguistic context. #Some example use of terminology and language module ############################################################################ ########### example:aResource term:pref [ a term:Term; term:value "Cats"; lang:lang lang:en ]; term:alt [ a term:Term; term:value "Felines"; lang:lang lang:en ]; term:pref [ a term:Term; term:value "Chats"; lang:lang lang:fr ]; ############################################################################ ########### Conceptual Indexing Module This is the bit I haven't cracked. Basically, this bit is supposed to allow you to build a conceptual structure of the nature of that usually described by a thesaurus, so identifying concepts and broader/narrower/related associations between them. But the emphasis is on its intended use. This is very important. By calling it the conceptual indexing module, it is very clear that any concept described by this vocabulary is intended for the purpose of indexing of web resources. This solves the problem of the word "thesaurus" being used to mean actually a bunch of different things used for quite different purposes (e.g. indexing thesauri, search aid thesauri, automated classification thesauri etc.) Here we must create a well-defined core, which can be extended by those who wish to use custom concept associations. I suggest the following for the core: #Conceptual Indexing Module core ##################################### @prefix cif: <???> cif:Concept a rdfs:Class. cif:id a rdf:Property; rdfs:domain cif:Concept; rdfs:range rdfs:Literal. cif:about a rdf:Property; rdfs:domain rdf:Resource; rdfs:range cif:Concept; ##################################### That is, every concept has a unique ID, and to index any web resource against a concept, declare that the resource is 'about' some concept. Beyond this things start to get a bit hazy. We may want also to do something like this, although I'm not sure: #CIF descriptor property ##################################### cif:descriptor a rdf:Property; rdfs:domain cif:Concept; rdfs:range term:Term. ##################################### That is, every concept must be linked to a term which is a noun or noun phrase that uniquely identifies it, such as 'Banks (Financial Institutions)'. Having both a 'cif:descriptor' property and a 'term:pref' (preferred-term) property simultaneously could be very confusing though. I think the following two properties are probably a good idea: #CIF foundation properties ##################################### cif:relation a rdf:Property; rdfs:domain cif:Concept; rdfs:range cif:Concept. cif:mapping a rdf:Property; rdfs:domain cif:Concept; rdfs:range cif:Concept. ##################################### The 'relation' property is the super-property of all properties linking concepts within the same scheme. Broader/narrower/related type properties should be declared as sub-properties of this property. The point is that, however we or anyone extends this property, the precise semantics (ie. meaning) of that property must be fully defined. We probably want to put some standard extensions in here, this we should definitely discuss!!! I have a heads up from the new british standards for thesauri which are under development, and it looks like they use a 'broader' relation to subsume the following relations 'broader-generic' (isa) 'broader-instantive' (instanceof) 'broader-partitive' (part-of). Then we might want to do something like this: #Properties to construct a hierarchy of concepts with no semantic implications ##################################### cif:parent a rdf:Property; rdfs:subPropertyOf cif:relation. cif:child a rdf:Property; rdfs:subPropertyOf cif:relation. cif:friend a rdf:Property; rdfs:subPropertyOf cif:relation. ##################################### to ensure we can catch all types of data, in the case where semantics are poorly defined or standards are not consistently adhered to. The 'mapping' property is the super-property of all properties linking concepts from different schemes. Here there is a basis for defining some clearly understood properties. If we take each concept from a scheme as standing for the set of resources which are indexed against it, then we can compare concepts from different schemes using set operations. We could use something like the following vocabulary to do this: #CIF Properties and classes to define mappings between Concepts from different sources ###################################### cif:sameAs a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:overlapsWith a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:includes a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:includedBy a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:disjointWith a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:complementOf a rdf:Property; rdfs:subPropertyOf cif:mapping. cif:ConceptCombination a rdfs:Class; rdfs:subClassOf cif:Concept. cif:Union a rdfs:Class; rdfs:subClassOf cif:ConceptCombination. cif:Intersection a rdfs:Class; rdfs:subClassOf cif:ConceptCombination. cif:Exclusion a rdfs:Class; rdfs:subClassOf cif:ConceptCombination. cif:ofConcept a rdf:Property; rdfs:domain cif:ConceptCombination; rdfs:range cif:Concept. cif:excluding a rdf:Property; rdfs:domain cif:Exclusion; rdfs:range cif:Concept. ###################################### This goes beyond the "equivalence" relations of the multilngual ISO standard, to clearly define the full set of possible set operations. #Example data - concept mappings ###################################### @prefix example2: <example2> . example:Politicians a cif:Concept; cif:includes [ a cif:Union; ofConcept example2:MPs; ofConcept example2:MEPs; ofConcept example2:Councillors ]; cif:includedBy example2:PublicEmployees; . ###################################### In the above example I have shown a mapping from a concept in a source scheme to both broader and narrower sets in the target scheme. This practise allows you to guarantee recall in the case were a user wants for example documents about Politicians, and in the case were the user requests documents NOT about politicians (see Doerr paper on concept mapping). There's also some other things which may be worth considering, mainly inspired by the linguisitics people, which are properties linking a concept to an explanation, a definition, a context, an example, a picture, any other multimedia object etc. In other words, links from a concept to lots of things which can help identify what that concept is referring to (thesaurus scope-notes fit in here too). This document has not considered how to do things like expressing the grouping of concepts into facets (such as 'objects', 'activities') or node groups (such as 'paintings by period' and 'paintings by artist''). The last thing is managing change and evolution. We probably want to consider some way of expressing that a concept is deprecated and has been replaced by a new one, or has been modified in some way, and other such things. Alistair Miles CCLRC - Rutherford Appleton Laboratory Building R1 Room 1.60 Fermi Avenue Chilton Didcot Oxfordshire OX11 0QX United Kingdom Email: a.j.miles@rl.ac.uk Telephone: +44 (0)1235 445440
Attachments
- application/rtf attachment: CIFoutline.rtf
Received on Friday, 19 September 2003 13:03:57 UTC