Re: Relationship between content and metadata from Farrukh Najmi on 2003-06-26 (www-rdf-dspace@w3.org from June 2003)

From: Farrukh Najmi <farrukh.najmi@sun.com>
Date: Thu, 26 Jun 2003 09:06:19 -0400
To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
CC: www-rdf-dspace@w3.org
Message-ID: <3EFAEFCB.7090606@sun.com>
Butler, Mark wrote:

>This paper
>
>http://www.dlib.org/dlib/april02/weibel/04weibel.html
>
>Has an interesting taxonomy on the relationship between metadata and
>content:
>
>"C. Association Models
>There are various ways to associate metadata with resources:
>
>Embedded metadata resides within the markup of the resource. This implies
>that the metadata is created at the time that the resource is created, often
>by the author. Experts differ concerning whether author-created metadata is
>best or whether it is better to have trained practitioners evaluate and
>describe resources. As a practical matter, resource description expertise is
>a scarce and costly commodity, and thus any investment by authors in the
>description of their intellectual products is likely to be of value.
>Embedded metadata can also be harvested, and the presumptive increase in
>visibility that might result is an incentive for creators to assign
>metadata. Early studies of the efficacy of such metadata are only recently
>becoming available [GRE-01].
>
It is often  subjective to determine what is metadata when it is actualy 
embedded in the content. Some may say that for XML content any 
attributes are metadata while elements are content. But then some 
schemas may use attributes for what would be considered content.

An approach that side steps the hair splitting over what is content and 
what is metadata embedded in the content, is to use pluggable content 
cataloging. Content cataloging works similar to database indexing. When 
content of a specific type is submitted, one or more plugable content 
specific cataloging services are invoked which automatically generate 
metadata from content in a content specific manner. Such services could 
also use knowledge external to the content such as context knowledge.

The result is the metadata can be generated in a standard form in an 
automated manner directly from content.

ebXML Registry version 2.5 [1] supports such pluggable content cataloging.

>
>Associated metadata is maintained in files tightly coupled to the resources
>they describe. Such metadata may or may not be harvestable. The advantage of
>associated metadata derives from the relative ease of managing the metadata
>without altering the content of the resource itself, but this benefit is
>purchased at the cost of simplicity, necessitating the co-management of
>resource files and metadata files.
>
The ability to manage metadata without altering the content is 
essential. However I do not see the cost mentioned above of 
co-management. [1] supports a loosely coupled distributed model which 
allows metadata to be associated with content without having to 
co-manage it. They could be in different registries and managed by 
different organizations. This requires strong authentication and 
authorization features (also supported by [1]).

>perhaps we want to introduce the distinction between embedded, associated
>and third-party metadata in the document?
>
This is a useful taxonomy indeed, though I am still unsure about 
embedded metadata taxonomy element as in practice it seems to be a fuzzy 
distinction.


[1] ebXML Registry Version 2.5

http://www.oasis-open.org/committees/regrep/documents/2.5/

-- 
Farrukh
Received on Thursday, 26 June 2003 09:07:31 UTC