Re: Quick Guide to Publishing a Thesaurus on the Semantic Web from Mark van Assem on 2005-02-14 (public-esw-thes@w3.org from February 2005)

From: Mark van Assem <mark@cs.vu.nl>
Date: Mon, 14 Feb 2005 14:19:15 +0100
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>, public-esw-thes@w3.org
Message-ID: <4210A553.90105@cs.vu.nl>
Hi Alistair,

I have some general comments and some more detailed suggestions.

In general I'm still wondering about the intended audience/goals of the
Quick Guide [1]. What I roughly understood is that it's something along
the lines of:

(a) the audience consist of (among others) thesaurus owners who know the
basics of RDF and are (1) interested if converting to RDF gives benefits
and (2) want simple examples of how things should work

(b) the outcome should be that the thesaurus owners have an intuitive
feeling of (1) how an RDF version would look like and (2) where to get
more info on how to do conversion in more detail.

If that's the case, I think this version satisfies a2 and b1, but some
more material is needed for b2. For example, it would probably be
helpful for them to know what technology one can use to convert an XML
version of a thesaurus to an RDF version (e.g. XSLT). Is a separate
section on "Conversion" something to consider? Or a reference to another
(to be created) document?

The first part of the section "Expressing a Thesaurus in RDF" is the
crucial bit in which readers will have to experience the "aha!" effect.
The extract from the UKAT, the following graph and its XML serialisation
are definately the way for doing that trick. There are
some points that might obstruct it (see below), but I'm not really in 
the position to evaluate if they are important or not as I'm not in the 
target audience :-)

1) the relation between "terms" and "skos:concepts".
2) the relation between a "tree-form" (UKAT extract) and a "graph-form"
of a thesaurus
3) the relation between the graph and its serialization

Concerning (1) I think this can be explicated by explaining that ISO
thesauri like UKAT are term-centric, but SKOS is concept-centric. What
this probably boils down to from the reader's perspective is that a
term's preferred term is mapped to skos:prefLabel, its non-preferred
terms to skos:altLabel, and that for each pref-term a separate
skos:Concept is introduced with a unique identifier (URI).

Concerning (2), I don't know if this is really an issue, as we should
presuppose that the audience knows the basics of RDF.

Concerning (3), it's important to note that the serialization shown is
not an _exact_ serialization of the graph above it. The XML/RDF defines
one concept and points at e.g. 'economic policy', while the graph also
shows the concept 'economic policy' and its skos:prefLabel. One more
thing is that the RDF/XML has a skos:inScheme property, which is not
present in the graph. Maybe this can be excluded to keep things simple?

Some detailed comments:

The abstract mentions "how to express the content and structure of a 
thesaurus". Maybe we can add e.g. "and thesaurus-like resources" that 
SKOS is useful for more than thesauri? Or is "thesaurus-like" a tricky 
formulation.

The abstract (and a later section) also mentions "RDF allows your data 
to be linked to and/or merged with other RDF data ..." Will it be clear 
enough for people from other communities what is meant with linking and 
merging? Maybe the point here is that RDF allows for easier use of 
different sources in conjunction over the web.

A similar point concerning "serialisation of the graph" in the 
Introduction. I liked the way you explained this in the Guide ("An RDF 
graph can be serialised (i.e. encoded as a series of characters) ... "). 
Maybe include that bit here?

In section "Expressing Thesaurus Metadata in RDF" I think it's very good
that the text emphasises URIs. Maybe also put the term "unique
identifier" in there to get bells ringing with those who are unfamiliar
with URIs?

About the explanation of skos:hasConcept in the next section, it may
also ring more bells if "facet" is mentioned in conjunction with "field").

In the section "Publishing RDF Data" a sentence might be included on why
it's useful to put the RDF thesaurus in an RDF server, e.g. something
like "This allows anyone to query the thesaurus over the web using an
RDF query language." (and some more refs to good material for people to 
get started with this?)

With regards,
Mark.

---

[1] http://www.w3.org/2004/03/thes-tf/primer/2005-02-08.html

Miles, AJ (Alistair) wrote:

> Hi Mark, Tom,
> 
> I put a new draft of the document 'Quick Guide to Publishing a Thesaurus on the Semantic Web' up:
> 
> http://www.w3.org/2004/03/thes-tf/primer/2005-02-08.html
> 
> How does that look?  Any chance you could look at it today?
> 
> Thanks,
> 
> Al.
> 
> ---
> Alistair Miles
> Research Associate
> CCLRC - Rutherford Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> Email:        a.j.miles@rl.ac.uk
> Tel: +44 (0)1235 445440
> 

-- 
  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        mark@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Monday, 14 February 2005 13:19:18 UTC