- From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
- Date: Wed, 18 Jun 2003 17:25:01 +0100
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>, "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Mark, That's a useful split. Cleaning is a commonly need in data warehousing so it seems likely to be a requirement for some use cases at least. It would fit with the proposed splitting of issues into Content/Metadata/Vocabulary. My reading of "3.9 Processing Models" is that it was about vocabulary/schema level issues as is, mainly, "3.3 Information Lifecycle" although the lifecycle have a certain commonality. We could have a metadata lifecycle as well as content and vocabulary lifecycles in each of the respective sections. Whether this is the lifecycle right split, I will leave to people better informed about the requirements for this domain, but I observe that section "3 Metadata" of your original text comes close to the stages below with the addition idea that metadata may be further modified, (described under 'augmentation') and it is not restricted to the original creation/addition stage. Architecturally, we have a processing model for metadata coming in and of metadata services that change and manage metadata already in the system and it would be good if these were not disjoint. Andy -----Original Message----- From: Butler, Mark [mailto:Mark_Butler@hplb.hpl.hp.com] Sent: 18 June 2003 16:42 To: www-rdf-dspace@w3.org Subject: metadata processing Hi Team, I found a description of metadata processing here http://www.techquila.com/mdf.html "The driving concept behing MDF is that the processing of metadata involves a number of different stages. Depending on the source and eventual usage of the metadata any one or all four of the following stages may be required: Discovery: the act of trawling some resource set for metadata resources (which may or may not be combined with the content the metadata describes). Extraction: the retrieval of metadata from some set of resources. Cleaning: the processing of metadata from its retrieved format into a format which is consistent with the final application. This may include lexical processing, reformatting of data and/or the combining of multiple diverse metadata vocabularies into a single consistent vocabulary. Aggregation: the storing of the cleaned metadata together with other similarly processed metadata. Within each of these stages, there are any number of different approaches which could be taken. For example, discovery could be by web-crawling, by executing searches or by recursing through file system directories. Extraction may require processing specific to the format of the resource retrieved. Cleaning could involve simple lexical processing (such as forcing all strings to a single case or splitting a string on particular boundaries) or complex extraction processing (such as named entity recognition on text). Finally the aggregation step might write RDF; a topic map in the XTM interchange syntax; a topic map in ISO 13250; or might be used to update a database or other datastore. MDF attempts to improve the reusability of the different processing functions for each of these stages by defining a framework in which the functions may be designed and implemented separately and then linked together in any combination to provide the desired processing." that I thought made a lot of sense so I wondered if it is worth describing this discovery, extraction, cleaning and aggregation model in section 3 of the "relevant technologies document"? As Andy notes, my description of "processing models" really only concentrates on the discovery section of metadata processing. Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 18 June 2003 12:25:22 UTC