RE: Compound concepts in a thesaurus structure from Houghton,Andrew on 2004-05-12 (public-esw-thes@w3.org from May 2004)

From: Houghton,Andrew <houghtoa@oclc.org>
Date: Wed, 12 May 2004 13:03:08 -0400
To: public-esw-thes@w3.org
Message-ID: <B56ABE145BEB0C40A265238FCAA420DF026F5388@oa2-server.oa.oclc.org>
> From: Aida Slavic [mailto:aida@acorweb.net] 
> Sent: Wednesday, May 12, 2004 11:39 AM
> Subject: RE: Compound concepts in a thesaurus structure
> 
> What I mean is that synthesized concepts may be stored in the 
> native database in two ways
> 
> 1) complex KOS concepts  as a simple text strings e.g. 
> classification
> 
> 2)  complex KOS concepts  as  structured/encoded headings 
> (each part of compound concept is separated by meaningful prefix: e.g.
> let's say that  
> TAG1 is the prefix for the  main heading and   TAG2 is a subheading
> meaning Place)
>
> If native database holds this kind of data any stripping of 
> TAG prefixes will also mean the stripping a functionality 
> attached to it.  So  SKOS may as well  accommodate the whole 
> lot: prefix+concept. 
> within skos:Concept

I'm not advocating (2) because it would force anyone processing
SKOS to recognizing the tagging of other XML grammars.  Certainly,
I could with (1):

  <skos:perfLabel>Cats--France</skos:perfLabel>

or (2)

  <skos:prefLabel rdf:parseType="Resource">
    <marc:subfield code="a">Cats</marc:subfield>
    <marc:subfield code="z">France</marc:subfield>
  </skos:prefLabel>

While (2) carries the MARC21-A semantics which might be useful
for backend systems, e.g. databases, the systems will need to
know about MARC21-A and what a subfield-a and subfield-z mean.
In addition, if they are presenting the label they will need
to know the MARC21 rules for presentation, e.g. how it's
printed in those Red Books distributed by LC.

Loosing the sub-tagging, in some circumstances, may result in
lost functionality.  No argument here.  But if that's the case,
why not skip SKOS and use the MARC-XML records directly?  There
are many XML database products on the market that can index and
handle any XML grammar.  So you can choose which XML grammar 
makes sense in your context.  Maybe MARC-XML makes sense to 
someone and SKOS make sense to another.

I see SKOS as a dumb-down approach, just like Dublin Core is.
Most of the time you don't need the heavy weight and all the
rules associated with MARC21 or UniMarc to describe vocabularies.
You want a relatively simpler and understandable format that 
you can use to communicate these vocabularies.  For vocabularies 
that aren't in electronic form SKOS may be a better choice for
getting them into electronic form.  At the same time we want to
be able to take all those existing vocabularies and put them in
a good dumb-downed form so people can experience quality
controlled vocabularies for the Semantic Web.  Yes, we may loose
some information in the conversion, but so long as it doesn't
affect the quality, it's acceptable.

I also mentioned in a prior message that creating metadata isn't
easy nor cheap.  I think the latest statistic I saw was that it
takes $70.US to do original cataloging on a book.  Considering
that, that book might have cost the library $10.US (paperback
novel) that's an expensive acquisition.  There are many
vocabulary maintainers that will not put their vocabularies on
the Web because there is no way to recoup the significant 
investment they make on a yearly basis to keep their vocabularies
current.  However, with SKOS being a dumb-downed approach, it's 
possible to convince them to at least put out skeletal concept 
records that will not cannibalize their existing revenue.

Maybe technical standards should be accompanied with a business
plan, who is going to buy it, why will they buy it, when will
the market be ready to buy it, what effect will it have on the
market, etc.  The bottom line for a vocabulary maintainer,
what's in it for me?  It's got to either open new markets or
generate additional revenue on existing products or services.
Just because it's good for the Semantic Web or Libraries really
doesn't pay your creditors...


Andy.

Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm
Received on Wednesday, 12 May 2004 13:03:33 UTC