- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Wed, 18 Jun 2003 16:42:03 +0100
- To: www-rdf-dspace@w3.org
Hi Team, I found a description of metadata processing here http://www.techquila.com/mdf.html "The driving concept behing MDF is that the processing of metadata involves a number of different stages. Depending on the source and eventual usage of the metadata any one or all four of the following stages may be required: Discovery: the act of trawling some resource set for metadata resources (which may or may not be combined with the content the metadata describes). Extraction: the retrieval of metadata from some set of resources. Cleaning: the processing of metadata from its retrieved format into a format which is consistent with the final application. This may include lexical processing, reformatting of data and/or the combining of multiple diverse metadata vocabularies into a single consistent vocabulary. Aggregation: the storing of the cleaned metadata together with other similarly processed metadata. Within each of these stages, there are any number of different approaches which could be taken. For example, discovery could be by web-crawling, by executing searches or by recursing through file system directories. Extraction may require processing specific to the format of the resource retrieved. Cleaning could involve simple lexical processing (such as forcing all strings to a single case or splitting a string on particular boundaries) or complex extraction processing (such as named entity recognition on text). Finally the aggregation step might write RDF; a topic map in the XTM interchange syntax; a topic map in ISO 13250; or might be used to update a database or other datastore. MDF attempts to improve the reusability of the different processing functions for each of these stages by defining a framework in which the functions may be designed and implemented separately and then linked together in any combination to provide the desired processing." that I thought made a lot of sense so I wondered if it is worth describing this discovery, extraction, cleaning and aggregation model in section 3 of the "relevant technologies document"? As Andy notes, my description of "processing models" really only concentrates on the discovery section of metadata processing. Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 18 June 2003 11:42:33 UTC