RE: Quick Guide to Publishing a Thesaurus on the Semantic Web

Hi Mark, all,

I've put a new editor's draft of the 'Quick Guide ...' at:

http://www.w3.org/2004/03/thes-tf/primer/2005-03-30

... incorporating your suggestions below.  How does this look?  

Some notes on what I changed ...

> I have some general comments and some more detailed suggestions.
> 
> In general I'm still wondering about the intended 
> audience/goals of the
> Quick Guide [1]. What I roughly understood is that it's 
> something along
> the lines of:
> 
> (a) the audience consist of (among others) thesaurus owners 
> who know the
> basics of RDF and are (1) interested if converting to RDF 
> gives benefits
> and (2) want simple examples of how things should work
> 
> (b) the outcome should be that the thesaurus owners have an intuitive
> feeling of (1) how an RDF version would look like and (2) where to get
> more info on how to do conversion in more detail.
> 
> If that's the case, I think this version satisfies a2 and b1, but some
> more material is needed for b2. For example, it would probably be
> helpful for them to know what technology one can use to convert an XML
> version of a thesaurus to an RDF version (e.g. XSLT). Is a separate
> section on "Conversion" something to consider? Or a reference 
> to another
> (to be created) document?

Conversion is a tricky subject, because as the discussion went before it isn't a matter of simply saving in SKOS format - a commitment must be made to the good use of URIs etc.  Also the specifics of generating and maintaining a SKOS/RDF representation of a thesaurus vary wildly depending on the technologies in place in the organisation, which makes a general sort of note difficult to write.  So I left this for the moment ... maybe we can discuss whether to try covering conversion in another note?

> 
> The first part of the section "Expressing a Thesaurus in RDF" is the
> crucial bit in which readers will have to experience the 
> "aha!" effect.
> The extract from the UKAT, the following graph and its XML 
> serialisation
> are definately the way for doing that trick. There are
> some points that might obstruct it (see below), but I'm not really in 
> the position to evaluate if they are important or not as I'm 
> not in the 
> target audience :-)
> 
> 1) the relation between "terms" and "skos:concepts".
> 2) the relation between a "tree-form" (UKAT extract) and a 
> "graph-form"
> of a thesaurus
> 3) the relation between the graph and its serialization
> 
> Concerning (1) I think this can be explicated by explaining that ISO
> thesauri like UKAT are term-centric, but SKOS is concept-centric. What
> this probably boils down to from the reader's perspective is that a
> term's preferred term is mapped to skos:prefLabel, its non-preferred
> terms to skos:altLabel, and that for each pref-term a separate
> skos:Concept is introduced with a unique identifier (URI).

Added the text:

'Note that, in expressing the content of a thesaurus such as the UKAT in RDF using SKOS Core, each descriptor (preferred term) becomes a preferred label for a concept, and each non-descriptor (non-preferred term) becomes an alternative label for a concept.'

> 
> Concerning (2), I don't know if this is really an issue, as we should
> presuppose that the audience knows the basics of RDF.

Left that.

> 
> Concerning (3), it's important to note that the serialization shown is
> not an _exact_ serialization of the graph above it. The 
> XML/RDF defines
> one concept and points at e.g. 'economic policy', while the graph also
> shows the concept 'economic policy' and its skos:prefLabel. One more
> thing is that the RDF/XML has a skos:inScheme property, which is not
> present in the graph. Maybe this can be excluded to keep 
> things simple?

I wanted to leave the skos:inScheme statement in, because I think it's quite important.  But I changed the text above the RDF/XML box to:

'An RDF/XML serialisation of the RDF description of the 'Economic cooperation' concept from the UKAT is below:'

> 
> Some detailed comments:
> 
> The abstract mentions "how to express the content and structure of a 
> thesaurus". Maybe we can add e.g. "and thesaurus-like resources" that 
> SKOS is useful for more than thesauri? Or is "thesaurus-like" 
> a tricky 
> formulation.

Added to intro:

'SKOS Core is designed to be used with not only thesauri, but also other types of 'concept scheme', such as classification schemes, subject heading systems, controlled vocabularies, glossaries, taxonomies etc'

> 
> The abstract (and a later section) also mentions "RDF allows 
> your data 
> to be linked to and/or merged with other RDF data ..." Will 
> it be clear 
> enough for people from other communities what is meant with 
> linking and 
> merging? Maybe the point here is that RDF allows for easier use of 
> different sources in conjunction over the web.

Added to intro:

'... In practice, this means that data sources can be distributed across the web in a decentralised way, but still be meaningfully composed and integrated by applications, often in novel and unanticipated ways.'

> 
> A similar point concerning "serialisation of the graph" in the 
> Introduction. I liked the way you explained this in the Guide 
> ("An RDF 
> graph can be serialised (i.e. encoded as a series of 
> characters) ... "). 
> Maybe include that bit here?

Done.

> 
> In section "Expressing Thesaurus Metadata in RDF" I think 
> it's very good
> that the text emphasises URIs. Maybe also put the term "unique
> identifier" in there to get bells ringing with those who are 
> unfamiliar
> with URIs?

Added:

'URIs are globally unique identifiers that may be used to refer to a resource unambiguously from any context. Anything can be a 'resource', not just web documents, therefore URIs can be used as identifiers for anything.'

> 
> About the explanation of skos:hasConcept in the next section, it may
> also ring more bells if "facet" is mentioned in conjunction 
> with "field").

Left that, because 'facet' is difficult due to overloaded usage.

> 
> In the section "Publishing RDF Data" a sentence might be 
> included on why
> it's useful to put the RDF thesaurus in an RDF server, e.g. something
> like "This allows anyone to query the thesaurus over the web using an
> RDF query language." (and some more refs to good material for 
> people to 
> get started with this?)

Added:

'Publishing via an RDF server allows anyone to query the thesaurus over the web via an RDF query language such as SPARQL [SPARQL].'

Cheers,

Al.

Received on Wednesday, 30 March 2005 17:55:07 UTC