W3C home > Mailing lists > Public > public-esw-thes@w3.org > July 2007

Standard format for exchange of thesaurus data?

From: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
Date: Mon, 9 Jul 2007 14:50:52 +0100
To: <public-esw-thes@w3.org>
Message-ID: <002e01c7c230$2aa40700$0300000a@DELL>

This message is for list members who are interested in a format designed
primarily for the exchange of whole thesauri rather than for live
interrogation of an online thesaurus. We need it to complete Part 5 of
BS 8723 - a standard that has been mentioned regularly on this list. The
format is to use XML but unlike SKOS it does not use RDF. (We hope,
however, that mapping to and from SKOS will be straightforward.)

Work on drafting the standard is now well advanced, but we have some
difficult choices and would welcome feedback from anyone who is willing
to evaluate the model and schemas developed so far. (See
http://porism.tdmweb.co.uk/BS8723/). A few words of explanation before
you go there...

Although BS 8723 Part 3 covers many different types of vocabulary, our
advisory group for Part 5 warned that it would be difficult to develop a
format adequate for all of them. The decision was made to focus on the
needs for monolingual and multilingual thesauri (which are described in
Parts 2 and 4 respectively). Thus it should enable the exchange of
thesauri with any or all of the features described in Part 2, plus the
features of Part 4 that are relevant to multilingual thesauri, but leave
aside classification schemes, subject headings, ontologies etc, and data
conveying mappings between these vocabularies. This is already quite a
demanding objective, because Part 2 includes provisions for some
sophisticated thesauri, with options for special features that may not
be needed in simpler vocabularies.

The first step was therefore to develop a data model for BS 8723-2,
incorporating also some of the provisions of BS 8723-4. From that model
an XML schema was derived, capable of serving as an exchange format, and
I shall refer to this as our Original Schema.

The schema may look quite complex to a newcomer. To overcome this
problem (and we are not sure whether it really is a problem) two
alternative approaches have been explored. One was to develop a
simplified model and schema (which we call the Core Version, in contrast
to the Full Version) that is absolutely compatible with the other, so
that users could choose which to apply without risking
misinterpretation. The disadvantage of the Core Version is that it
cannot be used to convey all the features and elements described in
BS8723-2. And some confusion may be caused by allowing two versions of
the same Original Schema. More details of the Core and Full versions of
the Original Schema may be found at http://porism.tdmweb.co.uk/BS8723/ ,
together with the Model, and explanations of the assumptions made in
deriving the schemas.

The other approach we have explored is to develop a completely different
Schema, based on Zthes. I don't like to send it herewith, in case the
attachment causes trouble for distribution via the list. But if you are
interested to evaluate the Alternative Schema, please ask and I'll send
it to you. I can also send you a reference to Zthes, which is an
application profile of Z39.50. 

An important part of evaluation is to test whether the schema can be
used to convey sample data including all the wanted features. (And
whether a thesaurus can be correctly reassembled from the XML file!) As
you will see on the website, testing of the Full/Core versions of the
Original Schema is well advanced, although not yet complete. A series of
test files has been successfully encoded and then decoded correctly
using an XSL transformation. The Alternative Schema has not yet been
tested with these files.

The questions that now confront us include:
A) Do we now have a satisfactory format for data exchange?
B) Should we choose the Original Schema or the Alternative Schema? ( We
are determined to encourage interoperability by recommending just one,
not a variety of formats.)
C) If we choose the Original Schema, should we offer both the Core and
Full versions, or is this just a source of confusion and we should
present only the Full version? (Currently, our committee favours the
latter approach.)

We would greatly welcome opinions and reactions. I would ask you not to
circulate the link more widely without the explanations I have included.
The documents and web pages are all draft working documents, and
amendments are sometimes made without any notification. We do not want
to mislead or confuse people.  

Please let us know what you think!

Convenor, BSI committee IDT/2/2/1

Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298

Received on Monday, 9 July 2007 13:51:02 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:09 UTC