metadata processing from Butler, Mark on 2003-06-18 (www-rdf-dspace@w3.org from June 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Wed, 18 Jun 2003 16:42:03 +0100
To: www-rdf-dspace@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F066A1D81@0-mail-1.hpl.hp.com>

Hi Team,

I found a description of metadata processing here
http://www.techquila.com/mdf.html

"The driving concept behing MDF is that the processing of metadata involves
a number of different stages. Depending on the source and eventual usage of
the metadata any one or all four of the following stages may be required:

Discovery: the act of trawling some resource set for metadata resources
(which may or may not be combined with the content the metadata describes). 
Extraction: the retrieval of metadata from some set of resources. 
Cleaning: the processing of metadata from its retrieved format into a format
which is consistent with the final application. This may include lexical
processing, reformatting of data and/or the combining of multiple diverse
metadata vocabularies into a single consistent vocabulary. 
Aggregation: the storing of the cleaned metadata together with other
similarly processed metadata. 
Within each of these stages, there are any number of different approaches
which could be taken. For example, discovery could be by web-crawling, by
executing searches or by recursing through file system directories.
Extraction may require processing specific to the format of the resource
retrieved. Cleaning could involve simple lexical processing (such as forcing
all strings to a single case or splitting a string on particular boundaries)
or complex extraction processing (such as named entity recognition on text).
Finally the aggregation step might write RDF; a topic map in the XTM
interchange syntax; a topic map in ISO 13250; or might be used to update a
database or other datastore.

MDF attempts to improve the reusability of the different processing
functions for each of these stages by defining a framework in which the
functions may be designed and implemented separately and then linked
together in any combination to provide the desired processing."


that I thought made a lot of sense so I wondered if it is worth describing
this discovery, extraction, cleaning and aggregation model in section 3 of
the "relevant technologies document"?

As Andy notes, my description of "processing models" really only
concentrates on the discovery section of metadata processing.

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Wednesday, 18 June 2003 11:42:33 UTC