Re: URI policy for thesaurus concepts

Alistair > One thing a GET request for the thesaurus URI should definitely
return is a
> description of that thesaurus (i.e. name, version, creators, description
of
> scope and content etc.) although again whether that should be machine or
> human readable is open.

Obviously *both* must be possible.We had megabytes of discussion on this in
the Topic Map community and elsewhere.
The answer is very simple - there are humans, and there are machines
(software agents).
A service may decide to serve only one of them, but she may decide to serve
both.
She may even decide to serve several machine protocols or several human
readable layouts.
The consequence is that a singe URL is not enough.
We need pairs of protocol-URL such as

HTML -> http://human.blah.org/thesaurus.html
WSDL -> http://services.blah.org/thesaurus.wsdl
DCMI -> http://dcmi.blah.org/thesaurus.xml
etc., etc.,

these must be explictly *pairs* (protocol -> URL) as the domain name must
not contain any significant meaning itself (see RFC URI)

Alistair > The other question is, should the request for the thesaurus URI
also return
> the entire content of the thesaurus?  Personally I think no, but again I'm
> not sure about that.

It never should by default!! A well established thesaurus easily counts
100.000s and more concepts! The requester (be it human or machine) must be
able to identify the thesaurus source without downloading the whole thing.

I my personal vision, the "whole thing" *never* will be downloaded at once:
avoid redunancy, and what the hell are we doing here? ---
We are establish means to *link to specific* concepts and make clear where
the come from.

Downloading and so duplicating a thesaurus is OK in some situations, but
this should be regarded as a very special use case.

Thomas

Received on Friday, 30 April 2004 16:01:42 UTC