Re: online course on Metadata from Sigfrid Lundberg, Lub NetLab on 2000-09-28 (www-rdf-interest@w3.org from September 2000)

From: Sigfrid Lundberg, Lub NetLab <siglun@gungner.lub.lu.se>
Date: Thu, 28 Sep 2000 11:03:07 +0200 (MET DST)
To: Johan Hjelm <johan.hjelm@era-t.ericsson.se>
cc: Margit.Hofer@eun.org, www-rdf-interest@w3.org
Message-ID: <Pine.LNX.3.96.1000928095830.29628A-100000@gungner>

On Thu, 28 Sep 2000, Johan Hjelm wrote:

> Just noted an error in the course (and I have only skimmed a part of it).
> 
> In the section "Benefits" under "Where is the Metadata?" it says
> "Metadata on Internet resources stored together with the resource are
> placed at the top of a document. They can be found inside the Header
> section of an HTML page and they are normally hidden to the user.
> Metadata stored separately from the resource are located in databases,
> in indexes or in catalogues."  This is not correct, since inline
> metadata can be placed anywhere in the document. 
> 
> And of course, one wonders why there is no mention of RDF, and why the
> metadata editor does not generate RDF.

That editor was delivered by us about two years ago, when there were
no accepted way of expressing Dublin Core metadata in RDF. And
unfortunately there is still no generally accepted way of expressing
qualified DC in RDF. This is most unfortunate, but is due to the fact
that the DCMI have not until recently been able to produce a
specification of acceptable "interdisciplinary" qualifiers.

It might be of interest to the RDF community to know that there is, in
fact, at least some hesitation whithin the different metadata
communities to implement RDF (for other purposes than as a transfer
format) and among information retrieval people. The reason is most
likely that while RDF and RDFS provides an excellent infrastructure
for the definition of semantics, the main policy of a RDF processor is
is promiscuous preservation of semantic granularity and diversity, and
to store the data as triples.

First, this data model have drawbacks for searching in comparison with
more traditional XML (XMLschema/XMLdtd based), SGML syntaxes or even
GRS-1 and MARC based record syntaxes (in Z39.50). These are ubiquitous
tools for us information retrieval people.  Second, when collecting
such data in a heterogeneous wide area network (harvesting for
building metadata aware search engines), one can foresee that the
semantic diversity promoted by RDF may cause problems as regards the
processing of those triples.

The processing of triple output requires a thesaurus of semantics,
such that inferencing software can aggregate data into a manageable
chunks of semantic categories, and to map those into a finite set of
search fields which can be understood by end users. There are projects
out there adressing these, like the HARMONY [1]. Interestingly, some
of the Harmony people seem more inclined towards a XMLschema
solution. We are also researching this area ourselves.

However, I have never seen a single message on this list (and I've
been lurking on it almost since its start) on the costs of semantic
diversity and how to tackle it.

Yours

Sigge

[1] http://www.ilrt.bris.ac.uk/discovery/harmony/

________________
Sigfrid Lundberg, Ph.D.,	.	siglun@munin.lub.lu.se
Lund University Library,		http://www.lub.lu.se/~siglun/
Netlab, PO Box 3, S-221 00 Lund		phone +46 (0)46 222 36 83

Received on Thursday, 28 September 2000 04:55:56 UTC