W3C home > Mailing lists > Public > public-swd-wg@w3.org > May 2009

SKOS Implementation: eurovoc thesaurus

From: Emilio Rubiera <emilio.rubiera@fundacionctic.org>
Date: Tue, 19 May 2009 13:41:13 +0200
To: public-swd-wg@w3.org
Cc: Diego Berrueta <diego.berrueta@fundacionctic.org>, Luis Polo Paredes <Luis.polo@fundacionctic.org>, Jose Maria Alvarez Rodriguez <JoseM.Alvarez@fundacionctic.org>
Message-Id: <1242733273.6515.223.camel@spitxa-pc>
Dear SWD-WG members,

we apologize for the delay submitting this implementation of the new
Candidate Recommendation of SKOS, but due some legal restrictions of the
original source it was not possible to be in time.


SKOS VOCABULARY IMPLEMENTATION: Eurovoc Thesaurus in SKOS

The Eurovoc thesaurus is a multilingual, polythematic thesaurus focusing
on the law and legislation of the European Union (EU). It is accesible
in 21 official languages of the EU. Within the EU, the Eurovoc thesaurus
is used in the Library of the European Parliament, the Publication
Office as well as other information institutions of the EU. Moreover,
the Eurovoc thesaurus is used in the libraries and documentation centers
of national parliaments e.g. Spanish Senate) as well as other
governmental and private organizations of member (and non-member)
countries of the EU. [http://europa.eu/eurovoc/]

Contact: Emilio Rubiera, CTIC Foundation, Spain.

Constructs used: 
      * skos:Concept
      * skos:broader
      * skos:narrower
      * skos:related
      * skos:prefLabel
      * skos:inScheme
      * skos:hiddenLabel
      * skos:scopeNote
      * skos:hasTopConcept

EUROVOC originally is distributed as a set of XML files (DTD also
available).

Succint statistics: the version used (4.0) contains:
      * 6.645 descriptors (6,645 Concepts)
      * 6.669 hierarchical
      * 3.636 associative relationships.
      * 21 Domains
      * 127 Microthesauri
      * 23 languages: Bulgarian, Spanish, Croatian, Czech, Danish,
        German, Estonian, Greek, English, French, Italian, Latvian,
        Lithuanian, Hungarian, Dutch, Polish, Portuguese, Romanian,
        Slovak, Slovene, Finnish and Swedish.
      * +150,000 Preferred labels (23 languages)
      * +96,000 Hidden labels (23 languages)
      * +15,000 Scope notes

The RDF version is composed by more than 550,000 triples. The dataset is
available to perform SPARQL queries at:

  http://idi.fundacionctic.org/classifications_endpoint/eurovoc

Implementation details:

Eurovoc is a thesaurus of thesauri. Some thesauri have an internal
structure which cannot be captured with SKOS primitives, i.e., the
thesaurus comprises several microthesauri and thematic fields. We used
the property skos:inScheme to relate each concept to its direct
microthesaurus, so in order to relate each microthesaurus to the main
thesaurus Eurovoc we had to define a new property "hasScheme".

More information about this implementation can be found in the next
reference:

        Luis Polo, Jose Maria Alvarez and Emilio Rubiera Azcona.
        Promoting Government Controlled Vocabularies to the Semantic
        Web: EUROVOC Thesaurus and CPV Product Classification Scheme. In
        Proceedings of the Semantic Interoperability in the European
        Digital Library workshop (SIEDL2008), co-located with 5th
        European Semantic Web Conference (ESWC2008), Tenerife, Spain,
        June 2, 2008. Available at:
        http://image.ntua.gr/swamm2006/SIEDLproceedings.pdf
        
Best regards,

-- 
Emilio Rubiera Azcona
R&D Department - CTIC Foundation
E-mail: emilio.rubiera@fundacionctic.org
Phone: +34 984 29 12 12
Parque Científico Tecnológico Gijón-Asturias-Spain
www.fundacionctic.org
Received on Tuesday, 19 May 2009 12:32:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:31:56 UTC