W3C home > Mailing lists > Public > public-esw-thes@w3.org > April 2005

RE: Quick Guide to Publishing a Thesaurus on the Semantic Web

From: Stela Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
Date: Wed, 6 Apr 2005 18:38:50 +0100
To: "'Mark van Assem'" <mark@cs.vu.nl>, "'Miles, AJ \(Alistair\)'" <A.J.Miles@rl.ac.uk>, <public-esw-thes@w3.org>
Message-ID: <000001c53acf$804194b0$7f8cdec2@xeno>

Just a quick note of support for Mark's comments. The thesauri I work
with, and those provided by my clients, are usually (perhaps always)
held in a database, but the database is not usually relational.

And I agree you have to be careful about conversion. You can't assume
editors will have used even the standard tags in exactly the way ISO
2788 envisages. 

Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298

-----Original Message-----
From: public-esw-thes-request@w3.org
[mailto:public-esw-thes-request@w3.org] On Behalf Of Mark van Assem
Sent: 06 April 2005 10:21
To: Miles, AJ (Alistair); public-esw-thes@w3.org
Subject: Re: Quick Guide to Publishing a Thesaurus on the Semantic Web

Hi Alistair,

> How about the following after the 'Expressing a Thesaurus in RDF'
section (feel free to hack or add suggestions): 
> ---
> Section: Generating an RDF Representation of a Thesaurus

How do you feel about "Converting a Thesaurus to an RDF Representation"?

Personally I'm fond of the word converting, "generating" to me has a 
feel that it's straightforward to do conversion. In any case we have to 
choose one word and stick with it in the text. I don't know if it makes 
a difference to anyone, if it doesn't then generating is fine.

> Most thesauri are stored and managed via a relational database.  The
best method for generating an RDF representation of a thesaurus from the
contents of a relational database will depend on both the technologies
deployed and the database schema, and is beyond the scope of this

Are you sure about this (most thesauri in rel. db)? Maybe we can try to 
generalize over formats to avoid any discussion, e.g.

"Most thesauri are stored in a relational database, XML file, or a text 
format as described in [ISO standards]. The best method for generating 
an RDF representation of a thesaurus from its original format will 
depend on both the technologies deployed and the schema. A description 
of recommended practices for conversion is beyond the scope of this 
document. "

> If an XML representation of the thesaurus is already available, then
an RDF/XML representation using SKOS Core may be generated via an XSLT
transformation.  The design of this transformation will depend on the
original XML format, and care must be taken to ensure sensible output.  

It would be great if we could show a small example of what difficulties 
can be encountered. So an example of what can go wrong. I can't think of

one related to UKAT (which would be the best, because relates to what 
reader has already read), but maybe this is something:

"For example, if the thesaurus contains terms in another language than 
its main language (e.g. French synonyms of English terms), the correct 
language tags should be attached to these terms."

or maybe:

"For example, some thesauri include term descriptions under the heading 
"scope note" which are actually definitions. These should be translated 
to skos:definition instead of skos:scopeNote. Careful consideration of 
both the intended usage of the thesaurus and the SKOS Core Vocabulary 
are required to ensure a consistent conversion."

(actually this last one may be confusing, because I'm not sure whether 
this is also the case for UKAT, which in the Guide is converted to 
skos:scopeNote instead of skos:definition)?

Or maybe an example about that people shouldn't forget to add a 
ConceptScheme definition, which is not in the original source.

> How about if I add a skos:inScheme arc to the graph instead?

This might leave the reader wondering what this inScheme is about 
(without text). But I'm all for keeping graph and RDF/XML as consistent 
as possible.

Hope this is useful,

  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        mark@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Wednesday, 6 April 2005 17:38:59 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:05 UTC