The EUROVOC Thesaurus Ontology Schema
Technical Description and Usage
Conventions
Date: 2009-10-26
Owner:
the Publications Office of the European Union (http://publications.europa.eu/),
© 2009 The Publications Office of the European Union
Originating departments — Copyright
2, rue Mercier
L-2985 Luxembourg
Authors:
Johan De Smedt (http://www.tenforce.com),
Bernard Vatant (http://www.mondeca.com)
1. The EUROVOC Thesaurus
The mission and use of the thesaurus is documented in English and in French.
The thesaurus is maintained by the OPOCE and is available via its portal.
A schema specific documentation was generated using OWLDoc via Protege 4. It is available via EUROVOC OWLDoc.
1.1 Namespace and Identification
Namespace: http://eurovoc.europa.eu/schema#.
The ontology and this description are available from http://eurovoc.europa.eu/ontology/.
Reading convention:
the following namespace abbreviations are used in this document:
1.2 Outline of the Modelling Approach.
The EUROVOC ontology schema is an extension of:
- SKOS Simple Knowledge Organization System
Reference
W3C Proposed Recommendation 15 June 2009
Including: Appendix B. SKOS eXtension for Labels (SKOS-XL)
The imported schemas are:
For technical documentation on SKOS please see the SKOS reference and the SKOS Primer.
1.2.1 Used
SKOS Entities
The SKOS classes and properties listed at the end of this section are used when publishing the EUROVOC thesaurus.
SKOS inference may be applied (e.g. to calculate inverse properties). The specification and semantics of these classes and properties are available in the SKOS reference and the SKOS Primer.
Note: Most of these classes and properties are specialized with EUROVOC specific extensions.
- skos:ConceptScheme
- skos:Concept
- skos:inScheme
- skos:hasTopConcept
- skos:topConceptOf
- skos:broader
- skos:related
- skos:prefLabel
- xl:prefLabel
- skos:altLabel
- xl:altLabel
- skos:hiddenLabel
- xl:Label
- xl:literalForm
- xl:labelRelation
- skos:definition
- skos:example
- skos:historyNote
- skos:scopeNote
1.2.2 Used
Schema Annotations
Some Dublin Core properties are reused and re-declared without importing the complete DCMI RDF schemes.
EUROVOC extensions define artefacts that either are specified
as top level classes and properties or as sub-classes and sub-properties of the SKOS and SKOS-XL
artefacts.
When possible, EUROVOC specific
extensions will entail SKOS artefacts. The specifications of these
inference rules are detailed using a EUROVOC schema annotation property:
- eu:importRule
Details the business rules applied on the EUROVOC RDF representation to
derive SKOS properties or class instances from EUROVOC specific properties
and class instances.Note: in the published EUROVOC
RDF data all these rules have been applied. Hence, this downloadable version is ready for regular
SPARQL services.
The range of some EUROVOC properties is an XML literal value.
For these properties, the OWL schema details the allowed XML structure using the annotation property:
- annotation:xmlLiteralType
In case a property object value is constrained to be an XML literal, this
property details the compliant XML structure.
2. The
EUROVOC Thesaurus Ontology Schema
2.1 Thesaurus
Publication
2.1.1 Defined
artefacts
eu:Export
eu:exportDate
eu:exportedThesaurus
eu:exportVersion
eu:intermediateRelease
eu:Language
eu:language
eu:supportedLanguage
eu:Thesaurus
2.1.2 Definitions
A EUROVOC thesaurus [eu:Thesaurus] is a
SKOS Concept Scheme. Subsequent publications of the thesaurus are
exported from the back-office maintenance system.
The publication characteristics of the
thesaurus are given by the back-office export [eu:Export] instance.
- The property eu:exportedThesaurus depicts
the thesaurus version defined and provided with the export.
- Depending on the value of the Boolean
property eu:intermediateRelease, the exported thesaurus is:
- [False] Either the consolidated official release
version of the thesaurus. The concepts in that thesaurus represent
the consolidated version of that concept. (when eu:intermediateRelease is not present, the value is defaulted to False)
- Or [True] the intermediate (pre-release)
version of the thesaurus including the formally accepted results that are
not released yet.
- The export is identified by the
properties eu:exportDate and eu:exportVersion.
An eu:Thesaurus instance lists the languages
for which its thesaurus concepts provide a preferred label (via skos:prefLabel
or xl:literalForm).
- The supported languages are
provided by the property eu:supportedLanguage. The value of this
property is an eu:Language instance.
- A skos:prefLabel holds the language
independent name of the Thesaurus: 'EUROVOC'
The eu:Language class is a convenience
class. Each instance represents a language. By convention, the URI
of an eu:Language instance is the registered public subject indicator (see http://psi.oasis-open.org/iso/639/#).
- Each eu:Language instance has its
language name represented as an RDF-Schema rdfs:label in all of the
available languages.
- The property eu:language provides the ISO
2 character code of the language (conforming the xsd:language value
space).
2.2 Thesaurus
Organisation
The overall structure decomposes the
EUROVOC thesaurus into a set of Domains and a set of Micro-Thesauri. The Domains
form a mathematical partition of the micro-thesauri. A micro-thesaurus is
a concept scheme. All concepts of a micro-thesaurus are concepts of the complete
EUROVOC thesaurus.
From these definitions, a domain is neither
a skos:Collection nor a skos:ConceptScheme.
2.2.1 Defined
Artefacts
eu:Domain
eu:domain
eu:hasPolyHierarchy
eu:inDomain
eu:microThesaurus
eu:MicroThesaurus
2.2.2 Definitions
- EUROVOC defines several domains of
interest. The class eu:Domain is the set of these domains.
- The property value of eu:microThesaurus
identifies an eu:Microthesaurus belonging to the eu:Domain of the property
subject. Any micro-thesaurus belongs to exactly 1 domain.
eu:domain is the inverse of property eu:microThesaurus.
- An eu:MicroThesaurus is modelled as a
SKOS concept scheme. SKOS has no artefact to model a
micro-thesaurus, hence eu:MicroThesaurus is an rdfs:subClassOf
a skos:ConceptScheme. Every EUROVOC concept typically belongs to at
least 2 skos:ConceptSchemes: the EUROVOC eu:Thesaurus instance and 1
eu:MicroThesaurus instance.
- SKOS has no construct to define a
collection of concept schemes. Hence eu:Domain is modelled as a
EUROVOC specific ontology class.
- The concepts of a EUROVOC micro-thesaurus
are considered to be in the domain that micro-thesaurus belongs
to. The eu:inDomain property represents this containement
relationship. It is similar but (in RDF-Schema sense) unrelated to
the skos:inScheme property.
- The name of an eu:Domain is a
skos:prefLabel. There must be one label per eu:supportedLanguage.
- The domain identifier is represented by
the property dc:identifier: The value is duplicated in the Domain
name. The format of the domain identifier is 2 digits, including
leading zeros.
- The name of an eu:MicroThesaurus is a
skos:prefLabel. There must be one label per language.
- The micro-thesaurus identifier is
represented by the property dc:identifier and is part of the
micro-thesaurus name. The format of the micro-thesaurus identifier
is 4 digits, including leading zeros. The 2 leading digits of the
micro-thesaurus identifier represent the domain the micro-thesaurus
belongs to.
- The value true for property eu:hasPolyHierarchy indicates
its subject (a micro-thesaurus or domain) will have poly-hierarchy.
2.3 Thesaurus
Concepts
2.3.1 Defined
Artefacts
eu:Country
eu:isoCountryCode
eu:ThesaurusConcept
2.3.2 Definitions
- The class eu:ThesaurusConcept is the set
of all EUROVOC concepts. It is an rdfs:subClassOf a skos:Concept.
- The class eu:Country is the sub-set of
eu:ThesaurusConcept instances representing a country. A country has
at most one 2-char ISO country code represented by the property
eu:isoCountryCode. Occasionally the code may not be known yet.
2.4 Thesaurus Notes
and References
EUROVOC notes are SKOS notes with a
particular usage convention to facilitate on-line publishing.
2.4.1 Defined
Artefacts
eu:language
eu:noteLiteral
eu:reference
eu:relevantURL
dc:source
2.4.2 Definitions
Within EUROVOC the content model of a SKOS
note is strictly modelled as an RDF resource (a blank node or a non blank node are
allowed) holding 2 properties:
- The eu:noteLiteral property
value is an XML literal which is structured as an xhtml:xhtml.body.type
content type. The required content type is specified by the
annotation:xmlLiteralType EUROVOC schema annotation. This
ensures providing structured publishing ready content (XML literal)
according a format that can be validated.
- The eu:language property provides a work
around because xml:lang and rdf:parseType="Literal" can not be
specified on the same property. The value space of the property is
according xsd:language (in practice limited to the 2-character ISO language codes only).
The other properties are:
- The eu:reference property is a sub-property of dcterms:references. It is not directly
used on EUROVOC concepts. Instead it is used within a note literal
to reference a EUROVOC concept.
The attribute typically may be used in a 'definition' or a 'scope note'
about a concept or a label.
To implement such a reference the property is embedded in notes using
XHTML + RDFa compliant mark-up (using the 'rel' attribute - see usage note
below).
- The property value of eu:relevantURL
holds a website URL that is relevant for the subject (a EUROVOC concept
or term).
- The property dc:source is a Dublin Core
metadata property denoting the source publication from which the described
resource is derived.
Definition and explanation:
- The described resource is a EUROVOC concept or Label.
- The description is done by means of a SKOS definition.
The Dublin Core property dc:source then typically references the resource
providing ground for the creation of the described resource.
Usage: Like the EUROVOC reference property, the dc:source property
need not be applied on a concept or a label. Instead, it may be
embedded in notes using RDFa compliant mark-up (e.g. the 'rel' attribute -
see usage note below).
2.4.3 Usage
Note for Encoding eu:noteLiteral with XHTML+RDFa Mark-Up
<skos:definition
rdf:parseType="Resource">
<eu:noteLiteral rdf:parseType="Literal"
xmlns="http://www.w3.org/1999/xhtml">
<h3>Usage of eu:reference as a link in notes (Scope note, History note,
definition, ...)</h3>
<ul>
<li>Notes can have XMLLiteral values with mark-up from the xhtml
namespace (http://www.w3.org/1999/xhtml).
The XHTML vocabulary must be compliant with XHTML version='XHTML+RDFa
1.0'.
The content will NOT include the xhtml:body element, only its content (plain
text, div, ...)
</li>
<li>In a note, an href reference may refer to a EUROVOC Concept or
a EUROVOC term (i.e. a SKOS-XL Label). Example:
<a rel="eu:reference"
href="http://eurovoc/europe.eu.....">xyz</a>
or <a rel="eu:reference"
href="http://eurovoc.europa.eu/C2448">xyz</a>
</li>
</ul>
</eu:noteLiteral>
<eu:language
rdf:datatype="http://www.w3.org/2001/XMLSchema#language">en</eu:language>
</skos:definition>
2.5 Labels
and Label Relations
EUROVOC labels are either preferred terms
or non preferred terms. The non preferred terms are either simple (i.e.
consisting of one component) or compound (i.e. consisting of 2 or more
components). Equivalence relationships are called advanced (or complex)
relationships because 2 or more labels are involved.
2.5.1 Defined
Artefacts
eu:acronym
eu:CompoundEquivalence
eu:CompoundNonPreferredTerm
eu:compoundNonPreferredTerm
eu:EquivalenceRelationship
eu:fullName
eu:permutedLiteralForm
eu:PreferredTerm
eu:preferredTermComponent
eu:qualifier
eu:shortName
eu:SimpleNonPreferredTerm
eu:ThesaurusTerm
eu:translationOf
eu:UF
eu:USE
eu:ufLabel
eu:ufPlusLabel
eu:useLabel
eu:usePlusLabel
2.5.2 Label
Definitions
- The class eu:ThesaurusTerm is the set of
EUROVOC terms or labels. It is a subset of the xl:Label class.
All EUROVOC label relations are expressed between thesaurus terms.
Terms may be preferred terms or non preferred terms. Non preferred
terms may be simple or compound.
- The class eu:PreferredTerm is the set of
EUROVOC preferred terms. Such a term typically is the object of an
xl:prefLabel property of an eu:ThesaurusConcept.
- The class eu:SimpleNonPreferredTerm is the
set of EUROVOC simple non preferred terms. Such a term typically is
the object of an xl:altLabel property of an eu:ThesaurusConcept. A
non preferred term is the equivalent of (one or more) preferred
term/s. The relationship between the non preferred term and a
preferred term must be established by an eu:EquivalenceRelationship.
In thesaurus standards the (simple) non preferred term tags its
corresponding components using USE whereas the equivalent preferred terms
of a (simple) non-preferred term tags that non-preferred term using UF.
- The class eu:CompoundNonPreferredTerm is
the set of EUROVOC compound non preferred terms. Such a term
typically is the object of xl:altLabel properties of 2 or more
eu:ThesaurusConcept. A compound non preferred term is composed of
two or more components that each can be represented by a preferred
term. The relationship between the compound term and its components
must be established by an eu:CompoundEquivalence relationship. In
thesaurus standards the compound non preferred term tags its corresponding
components using USE+ whereas the preferred terms that are components of a
compound non-preferred term tags that non-preferred term using UF+.
- The property eu:qualifier is a label
qualifier discriminating an eu:ThesaurusTerm from other homographs in
a given language. The language of the qualifier is implied by
the language of the xl:literalForm on the xl:Label in the eu:qualifier
property (RDFs) domain.
- CONVENTION: For EUROVOC, the occasional
qualifier is part of the literal form of the label. It is appended
using the following format rules:
label = base-label + space + '(' + qualifier + ')'.
This convention
allows deriving the base label value when needed.
Example: qualified term (qualifier value)
- Any eu:ThesaurusTerm may be looked-up
using specific permutations of the words in the term's
xl:literalForm. The used permutations of a subject term are provided
using the eu:permutedLiteralForm property. The following rules apply
to create skos:hiddenLabel for an eu:ThesaurusConcept:
- A chain
xl:prefLabel([Concept][Term]) o
eu:permutedLiteralForm([Term][literal])
→
skos:hiddenLabel([Concept][literal])
- A chain
xl:altLabel([Concept][Term]) o
eu:permutedLiteralForm([Term][literal])
→
skos:hiddenLabel([Concept][literal])
2.5.3 Simple
Label Relation Definitions
- The eu:acronym property is a
specialisation of the xl:labelRelation providing the acronym of the
subject label.
- The eu:fullName property is a
specialisation of the xl:labelRelation providing the full name of the subject
label.
- The eu:shortName property is a
specialisation of the xl:labelRelation providing the short name of
the subject label.
- The eu:translationOf property is a
specialisation of the xl:labelRelation providing the translation of
the subject label.
2.5.4 Advanced
Label Relation Definitions
Dedicated EUROVOC classes are used to
distinguish USE/UF and USE+/UF+ relationships.
- The class eu:CompoundEquivalence is the
set of associations establishing the relationship between one compound non
preferred term (property eu:compoundNonPreferredTerm) and its component
terms (properties eu:preferredTermComponent). The component terms
must be preferred terms (eu:PreferredTerm). When a preferred term
(of the relationship) is the SKOS-XL preferred label of an
eu:ThesaurusConcept, then the compound non preferred term
(eu:CompoundNonPreferredTerm) must be a SKOS-XL alternate label of that
concept.
In thesaurus standards this relationship is tagged using
USE+/UF+. The EUROVOC schema also represents this relationship using the
inferred properties: eu:usePlusLabel and eu:ufPlusLabel. These
properties are sub-properties of xl:labelRelation.
- The class eu:EquivalenceRelationship is
the set of associations establishing the relationship between a preferred
term (property eu:USE with value an eu:PreferredTerm) and a simple non
preferred term (property eu:UF with value an
eu:SimpleNonPreferredTerm). When the preferred term in this relationship
is the xl:prefLabel of an eu:ThesaurusConcept, then the simple none
preferred term must be a xl:altLabel of the concept denoted by that
preferred term.
In thesaurus standards this relationship is tagged
using USE/UF. The EUROVOC schema also represents this relationship using the
inferred properties: eu:useLabel and eu:ufLabel. These
properties are sub-properties of xl:labelRelation.
2.6 Versioning
Properties
2.6.1 Defined
Artefacts
eu:termReleasedWithVersion
eu:useInstead
2.6.2 Definitions
- The value of the property
eu:termReleasedWithVersion provides the number of the EUROVOC release that
introduced the property subject (a EUROVOC term). The fixed string
'n/a' indicates the historic release number is not available.
- The property eu:useInstead applies to
EUROVOC concepts that are obsolete as from a date mentioned in the scope
note of the now obsolete concept. For the purpose of indexing or
classification that applies beyond the mentioned date, use the concept
denoted by eu:useInstead (instead of using the obsolete concept).
2.7 Maintenance
Properties for Resources in an Intermediate Release
2.7.1 Defined
Artefacts
eu:approvedForRelease
eu:status
eu:toBeTranslated
2.7.2 Definitions
- The value of eu:approvedForRelease provides
the expected EUROVOC release number for the official
release of a new or modified EUROVOC concept (or domain or
micro-thesaurus). The property will only be present in intermediate
maintenance releases of EUROVOC.
- The eu:status property value is the
position or state of the EUROVOC concept or label (term) with respect to
the EUROVOC back-office maintenance workflow.
Three states are relevant for a EUROVOC publication:
- 'valid': for released terms and concepts
- 'to be translated': A term requiring further translation. Look at
eu:toBeTranslated to find the outstanding translations
- 'in maintenance': a concept or term in maintenance. Look at
eu:approvedForRelease to know for which version the approval holds.
If the property is not present, 'valid' must be assumed.
- The eu:toBeTranslated property
typically is set only on EUROVOC concepts or terms that are announced in a
maintenance release. The range value must have
- a language (the language for which translation is required)
- for a term this must be the language of the term
- for a micro-thesaurus or domain this may be a comma separated list of
languages
2.8 General
Auditing Properties
2.8.1 Defined
Artefacts
dc:identifier
dcterms:created
dcterms:modified
2.8.2 Definitions
The Dublin Core defined metadata properties
may be available on any instance to define
- The creation date of the resource
- The modification date of the resource
- A persistent identifier of the object as
generated in the EUROVOC context. When specified, the intended use of the
identifier is explained in the schema.