RE: Quick Guide to Publishing a Thesaurus on the Semantic Web

Hi Mark, all,

How does this look for the proposed additional section to the quick guide:

---
Creating and Maintaining an RDF Representation of a Thesaurus

Most thesauri are stored in a relational database, XML file(s) or text file(s).  It is usually possible to create an RDF representation of a thesaurus from its current representation format via some sort of automated procedure (e.g. a text parsing program, an XSLT transformation etc.)  When using an automated procedure, care must be taken to ensure that the output generated is sensible and conforms to the recommended usage of the SKOS Core Vocabulary.  

For example, if an XML format contains an XML element whose name is 'scopenote' it should not be automatically assumed that the textual content of these elements should be converted to the value of a skos:scopeNote property.  Perhaps these 'scopenote' elements actually contain definitions, in which case the skos:definition property should be used; or perhaps the 'scopenote' elements have been used very loosely and contain all kinds of note types, in which case the more general skos:publicNote property would be more appropriate.  A full discussion of conversion techniques and best practice is beyond the scope of this document.

Also note that maintaining an RDF representation of a thesaurus requires clear policies for versioning and change management.  For example, users need to know if the meaning associated with a URI is stable, and if it isn't they need to know how and when it may change.  A fuller discussion of management best practice is currently being undertaken by the Vocabulary Management Task Force of the Semantic Web Best Practices and Deployment Working Group.
---

???

Cheers,

Al.


---
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Email:        a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440



> -----Original Message-----
> From: Mark van Assem [mailto:mark@cs.vu.nl]
> Sent: 06 April 2005 10:21
> To: Miles, AJ (Alistair); public-esw-thes@w3.org
> Subject: Re: Quick Guide to Publishing a Thesaurus on the Semantic Web
> 
> 
> 
> Hi Alistair,
> 
> 
> > How about the following after the 'Expressing a Thesaurus 
> in RDF' section (feel free to hack or add suggestions): 
> > 
> > ---
> > Section: Generating an RDF Representation of a Thesaurus
> 
> How do you feel about "Converting a Thesaurus to an RDF 
> Representation"? 
> Personally I'm fond of the word converting, "generating" to me has a 
> feel that it's straightforward to do conversion. In any case 
> we have to 
> choose one word and stick with it in the text. I don't know 
> if it makes 
> a difference to anyone, if it doesn't then generating is fine.
> 
> > Most thesauri are stored and managed via a relational 
> database.  The best method for generating an RDF 
> representation of a thesaurus from the contents of a 
> relational database will depend on both the technologies 
> deployed and the database schema, and is beyond the scope of 
> this document.  
> 
> Are you sure about this (most thesauri in rel. db)? Maybe we 
> can try to 
> generalize over formats to avoid any discussion, e.g.
> 
> "Most thesauri are stored in a relational database, XML file, 
> or a text 
> format as described in [ISO standards]. The best method for 
> generating 
> an RDF representation of a thesaurus from its original format will 
> depend on both the technologies deployed and the schema. A 
> description 
> of recommended practices for conversion is beyond the scope of this 
> document. "
> 
> 
> > If an XML representation of the thesaurus is already 
> available, then an RDF/XML representation using SKOS Core may 
> be generated via an XSLT transformation.  The design of this 
> transformation will depend on the original XML format, and 
> care must be taken to ensure sensible output.  
> 
> It would be great if we could show a small example of what 
> difficulties 
> can be encountered. So an example of what can go wrong. I 
> can't think of 
> one related to UKAT (which would be the best, because relates to what 
> reader has already read), but maybe this is something:
> 
> "For example, if the thesaurus contains terms in another 
> language than 
> its main language (e.g. French synonyms of English terms), 
> the correct 
> language tags should be attached to these terms."
> 
> or maybe:
> 
> "For example, some thesauri include term descriptions under 
> the heading 
> "scope note" which are actually definitions. These should be 
> translated 
> to skos:definition instead of skos:scopeNote. Careful 
> consideration of 
> both the intended usage of the thesaurus and the SKOS Core Vocabulary 
> are required to ensure a consistent conversion."
> 
> (actually this last one may be confusing, because I'm not 
> sure whether 
> this is also the case for UKAT, which in the Guide is converted to 
> skos:scopeNote instead of skos:definition)?
> 
> Or maybe an example about that people shouldn't forget to add a 
> ConceptScheme definition, which is not in the original source.
> 
> 
> > How about if I add a skos:inScheme arc to the graph instead?
> 
> This might leave the reader wondering what this inScheme is about 
> (without text). But I'm all for keeping graph and RDF/XML as 
> consistent 
> as possible.
> 
> Hope this is useful,
> Mark.
> 
> -- 
>   Mark F.J. van Assem - Vrije Universiteit Amsterdam
>         mark@cs.vu.nl - http://www.cs.vu.nl/~mark
> 
> 

Received on Tuesday, 3 May 2005 15:25:12 UTC