Re: Meta Data Handling from Judith Slein on 1997-03-19 (w3c-dist-auth@w3.org from January to March 1997)

From: Judith Slein <slein@wrc.xerox.com>
Date: Wed, 19 Mar 1997 11:28:30 PST
To: Jim Whitehead <ejw@ics.uci.edu>
Cc: w3c-dist-auth@w3.org, ejw@ics.uci.edu, "Ron Daniel Jr." <rdaniel@acl.lanl.gov>
Message-Id: <2.2.32.19970319192830.014c3918@pop-server.wrc.xerox.com>
Yaron and Jim: 

Metadata as headers would solve a lot of the metadata problems that have
been worrying me about the earlier proposal based on links.  If metadata are
headers, problems of managing metadata go away:  if you move a resource, its
metadata will move with it.  If you copy a resource, its metadata will get
copied with it.  If you delete a resource, its metadata will get deleted.
There are no longer issues about referential integrity -- since two
resources can't share the same metadata, you don't have to worry about
unintentionally deleting metadata that is still in use.  It used to be the
case that you couldn't even tell by looking at a resource whether it was
metadata or not -- using headers, that problem is gone.

Of course by using resources linked to the objects they describe, you had
some great benefits, too:  the possibility of sharing metadata, the
possibility of metadata itself having metadata, the possibility for a
resource to have multiple sets of metadata maybe authored by different
people maybe residing on different servers, etc.

If we move toward a model that allows metadata to be stored either in
headers or in resources linked to the object they describe, we need to be
clear about the costs.  For any metadata that is implemented as resources,
the original problems of management still exist. 

Jim:

Great analysis of the problem space!

A nit that I think is a source of great confusion:  Links are not really
metadata. The link together with its destination resource is metadata.  So
although the link is *on* a resource, insofar as we implement metadata using
links, the metadata is *partly* on the resource, and partly a separate resource.

Just managing the links is not managing metadata.  Managing the destination
resource is part of the problem.  Where a resource has links to many
metadata resources, the problem becomes onerous.  The fact that the
destination resource does not have a link back to the resource (or
resources) it describes makes preserving referential integrity extremely
difficult.

Searching links is not searching metadata.  Searching links just tells you
what metadata is available.  Searching metadata requires you to follow the
links to the destination resources and search those resources for a matching
value.

--Judy

At 04:33 PM 3/18/97 PST, Jim Whitehead wrote:
>Ron Daniel wrote:
>>I agree with Yaron to a limited extent. Handling *all* metadata as
>>resources is inappropriate. However, it is my considered technical
>>opinion that handling *all* metadata as headers is just as inappropriate
>>as handling *all* metadata as separate resources. Some descriptions,
>>such as Content-length, Last-modified, Content-type, ... are best
>>carried as headers. Other descriptions, such as detailed revision
>>histories, provenance tracking, and bibliographic descriptions, are
>>best carried as separate resources.
>>
>>WEB-DAV needs a metadata architecture that accommodates both.
>
>I agree.
>
>After reviewing many Web metadata proposals, including PICS, PICS-NG,
>Dublin Core, Warwick Framework, W3C position papers on a resources and
>relationships metadata model, W3C position papers on adding metadata to
>HTML, Murray Malone's REL/REV draft, and Web Collections, in addition to
>skimming the proceedings of the first IEEE Metadata conference (yes, Ora,
>we've done our homework :-), my view is there are two main varieties of
>metadata:
>
>"Small" chunk metadata:
>
>These include metadata items such as:
>  - HTTP headers
>  - short attribute-value pairs
>  - typed links (e.g. HTTP links)
>
>While developing a stringent definition of "small" is most likely
>impossible, since the definition is arbitrary, and seems to be based on
>unstated assumptions about retrieval performance (e.g., retrieval of small
>chunk metadata should be "trivially" or "unnoticeably" fast),  much
>metadata has a small chunk flavor to it.
>
>Characteristics of small chunk metadata include: fast retrieval speeds, no
>need for content negotiation, no requirements on ordering, no need for
>"trust" information (e.g., digital signature, author information, hash of
>contents, date of creation), and relatively simple value information.
>
>"Large" chunk metadata:
>
>These include metadata items such as instances of:
>  - PICS, PICS-NG collections
>  - Warwick collections
>  - MARC records
>  - Dublin Core records
>  - discipline-specific metadata records
>  - Web pages
>
>Like the smallness of small chunk metadata, the largeness of large chunk
>metadata is similarly difficult to define (a strong indicator that small
>and large are poor terms).
>
>Characteristics of large chunk metadata include: requirements on the
>ordering of fields, encoded trust information, pointers to metadata schema
>descriptions, complex data models, and multiple levels of containment.
>Large chunk metadata often contains several instances of small chunk
>metadata.  Typically large chunk metadata is larger than small chunk
>metadata, although it is easy to develop classes of both for which this
>assertion does not hold.  As a result, there is an assumption that large
>chunk metadata takes longer to transmit than small chunk metadata.
>
>Mapping of metadata to Web data model
>
>The mapping of metadata to the various data containers (resources, headers)
>in the Web data model varies depending on whether the metadata is stored
>on, in, or as a resource.
>
>1. On resource. In this case, the metadata is stored with the resource, but
>is not part of the resource itself.  Examples: HTTP links, HTTP headers,
>PICS labels (using the PICS-Label header).  This is typically used for
>small chunk metadata.  On resource metadata is typically retrievable in 1
>request (a HEAD or GET).
>
>2. Within resource.  The metadata is embedded within the resource, and is a
>defined part of the description of the document type.  Examples: HTML
>REL/REV, HTML META tag, various HTML metadata proposals, MS Word .DOC
>documents, Web Collections (?).  Within resource metadata is retrievable in
>1 request (GET).  Within resource metadata has the advantage of being
>independent of access protocol, and portable (when the resource moves, it
>does too).  Within resource metadata tends to be small chunk.
>
>3. Is (whole) resource.  The metadata is itself an entire resource.  When
>the metadata is an entire resource, there usually exists a relationship
>(link) between the described and metadata (describing) resources.  This
>bears a resemblance to entity-relationship or semantic data modeling
>database models.  Examples: Web Collections, Warwick containers, Web pages.
>Typically large-chunk metadata ends up as whole resource metadata.
>Typically retrieval of whole resource metadata requires 2 requests (one to
>get the links, one to get the metadata).
>
>Relation to WEBDAV.
>
>Using this model of the mapping of metadata to the Web data model, the
>various WEBDAV proposals can be characterized.  In
><draft-jensen-webdav-ext-00> (the proposal discussed at the Irvine
>meeting), all metadata was whole resource metadata, with the exception of
>the links used to hold the relationship between the described resource and
>the metadata resource. In Yaron's recent proposal, the pendulum swings to
>the other end, emphasizing a model where metadata is predominantly on
>resource metadata.  While some might argue that his proposal makes all
>metadata on resource metadata, this is undoubtedly too strict an
>interpretation.  Yaron will undoubtedly argue that whole resource metadata
>is still supported since links can still be defined and followed.
>
>However, now that we have investigated the predominantly whole resource,
>and predominantly on resource solutions of mapping metadata to the Web data
>model, I do feel we can agree with:
>
>>1) Neither headers or separate resources meet all the requirements on
>>   metadata in WEB-DAV, so we will need a combined solution.
>
>Roadmap to the future:
>
>Ron Daniel writes:
>>If we can agree on an architecture that isn't all one way or the other,
>>then we can advance to more meaningful arguments, like just what new headers
>>(if any) we need to define, what packages (if any) we initially want to
>>support, how (or whether) their elements can be used in both contexts, and
>>how we can do queries on them.
>
>Since there are two main containers for data in the HTTP data model,
>headers and resources, and since we have seen arguments in favor of storing
>metadata in both of these places, it makes sense to develop a mechanisms in
>WEBDAV for storing metadata in both places.  Alternatively, we might
>consider extending the places we can store metadata, and hence extend the
>HTTP data model.
>
>Sticking with just the HTTP data model, this would take the form of:
>
>In headers:
>
> - a means of adding a piece of metadata
> - a means of modifying a piece of metadata
> - a means of deleting a piece of metadata
> - a means of retrieving a piece of metadata
> - a means of querying for metadata
>
>These capabilities largely map to Yaron's recent proposal.
>
>As resources:
>
> - a means of creating a link to another resource
> - a means of creating, modifying, deleting, and retrieving a whole resource
>
>These capabilities map to the existing HTTP/1.1 specification.
>
>This also agrees with Ron Daniel:
>>Assuming we agree that some metadata is handled by headers and some by
>>separate resources, we can now discuss ways of editing it. For metadata
>>held as resources, the GET, DELETE, and PUT (or POST) methods should
>>suffice. For the smaller info carried as headers we may need methods such
>>as you describe. I think the essential functionality of METAGET is already
>>handled by the HEAD method. Something like METAPOST and METADELETE seems
>>necessary.
>
>Links.
>
>Ron Daniel writes:
>>I think that redefining existing parts of the HTTP spec, such as LINKs,
>>is beyond the bounds of this WG. Furthermore, annotations can be
>>handled under the existing constraints on LINKs.
>
>I disagree in the case of LINKs, since they are underspecified in the
>current HTTP specification, are defined in an Appendix, and do not meet the
>needs of annotation services (since the source is implicitly the resource
>on which the link is defined).  I think we should accept as a goal to be as
>compatibile as possible with HTTP/1.1 links, but meeting our requirements
>should come first.
>
>However, we do have an open issue.  Are links simply metadata, or are they
>a special class of metadata?  If there is a generic way to define and
>delete HTTP headers (e.g. METAPOST and METADELETE), and a link is simply an
>HTTP header called "Link", then do there need to be LINK and UNLINK methods
>at all?  If links have semantics which differentiate them from other
>headers (e.g., LINKSEARCH) then it makes sense to have LINK and UNLINK
>headers, and even separate retrieval functions (such as in
><draft-jensen-webdav-ext-00>).  If they are simply headers, then LINK and
>UNLINK have no place (and HTTP should be renamed the Object-Oriented
>Transfer Protocol :-)
>
>- Jim
>
>
>
>
Name:			Judith A. Slein
E-Mail:			slein@wrc.xerox.com
Internal Phone:  	8*222-5169
External Phone:		(716) 422-5169
Fax:			(716) 265-7133
MailStop:		128-29E
Received on Wednesday, 19 March 1997 14:30:48 UTC