- From: Jim Whitehead <ejw@ics.uci.edu>
- Date: Tue, 18 Mar 1997 16:33:11 -0800
- To: w3c-dist-auth@w3.org
- Cc: ejw@ics.uci.edu, "Ron Daniel Jr." <rdaniel@acl.lanl.gov>
Ron Daniel wrote: >I agree with Yaron to a limited extent. Handling *all* metadata as >resources is inappropriate. However, it is my considered technical >opinion that handling *all* metadata as headers is just as inappropriate >as handling *all* metadata as separate resources. Some descriptions, >such as Content-length, Last-modified, Content-type, ... are best >carried as headers. Other descriptions, such as detailed revision >histories, provenance tracking, and bibliographic descriptions, are >best carried as separate resources. > >WEB-DAV needs a metadata architecture that accommodates both. I agree. After reviewing many Web metadata proposals, including PICS, PICS-NG, Dublin Core, Warwick Framework, W3C position papers on a resources and relationships metadata model, W3C position papers on adding metadata to HTML, Murray Malone's REL/REV draft, and Web Collections, in addition to skimming the proceedings of the first IEEE Metadata conference (yes, Ora, we've done our homework :-), my view is there are two main varieties of metadata: "Small" chunk metadata: These include metadata items such as: - HTTP headers - short attribute-value pairs - typed links (e.g. HTTP links) While developing a stringent definition of "small" is most likely impossible, since the definition is arbitrary, and seems to be based on unstated assumptions about retrieval performance (e.g., retrieval of small chunk metadata should be "trivially" or "unnoticeably" fast), much metadata has a small chunk flavor to it. Characteristics of small chunk metadata include: fast retrieval speeds, no need for content negotiation, no requirements on ordering, no need for "trust" information (e.g., digital signature, author information, hash of contents, date of creation), and relatively simple value information. "Large" chunk metadata: These include metadata items such as instances of: - PICS, PICS-NG collections - Warwick collections - MARC records - Dublin Core records - discipline-specific metadata records - Web pages Like the smallness of small chunk metadata, the largeness of large chunk metadata is similarly difficult to define (a strong indicator that small and large are poor terms). Characteristics of large chunk metadata include: requirements on the ordering of fields, encoded trust information, pointers to metadata schema descriptions, complex data models, and multiple levels of containment. Large chunk metadata often contains several instances of small chunk metadata. Typically large chunk metadata is larger than small chunk metadata, although it is easy to develop classes of both for which this assertion does not hold. As a result, there is an assumption that large chunk metadata takes longer to transmit than small chunk metadata. Mapping of metadata to Web data model The mapping of metadata to the various data containers (resources, headers) in the Web data model varies depending on whether the metadata is stored on, in, or as a resource. 1. On resource. In this case, the metadata is stored with the resource, but is not part of the resource itself. Examples: HTTP links, HTTP headers, PICS labels (using the PICS-Label header). This is typically used for small chunk metadata. On resource metadata is typically retrievable in 1 request (a HEAD or GET). 2. Within resource. The metadata is embedded within the resource, and is a defined part of the description of the document type. Examples: HTML REL/REV, HTML META tag, various HTML metadata proposals, MS Word .DOC documents, Web Collections (?). Within resource metadata is retrievable in 1 request (GET). Within resource metadata has the advantage of being independent of access protocol, and portable (when the resource moves, it does too). Within resource metadata tends to be small chunk. 3. Is (whole) resource. The metadata is itself an entire resource. When the metadata is an entire resource, there usually exists a relationship (link) between the described and metadata (describing) resources. This bears a resemblance to entity-relationship or semantic data modeling database models. Examples: Web Collections, Warwick containers, Web pages. Typically large-chunk metadata ends up as whole resource metadata. Typically retrieval of whole resource metadata requires 2 requests (one to get the links, one to get the metadata). Relation to WEBDAV. Using this model of the mapping of metadata to the Web data model, the various WEBDAV proposals can be characterized. In <draft-jensen-webdav-ext-00> (the proposal discussed at the Irvine meeting), all metadata was whole resource metadata, with the exception of the links used to hold the relationship between the described resource and the metadata resource. In Yaron's recent proposal, the pendulum swings to the other end, emphasizing a model where metadata is predominantly on resource metadata. While some might argue that his proposal makes all metadata on resource metadata, this is undoubtedly too strict an interpretation. Yaron will undoubtedly argue that whole resource metadata is still supported since links can still be defined and followed. However, now that we have investigated the predominantly whole resource, and predominantly on resource solutions of mapping metadata to the Web data model, I do feel we can agree with: >1) Neither headers or separate resources meet all the requirements on > metadata in WEB-DAV, so we will need a combined solution. Roadmap to the future: Ron Daniel writes: >If we can agree on an architecture that isn't all one way or the other, >then we can advance to more meaningful arguments, like just what new headers >(if any) we need to define, what packages (if any) we initially want to >support, how (or whether) their elements can be used in both contexts, and >how we can do queries on them. Since there are two main containers for data in the HTTP data model, headers and resources, and since we have seen arguments in favor of storing metadata in both of these places, it makes sense to develop a mechanisms in WEBDAV for storing metadata in both places. Alternatively, we might consider extending the places we can store metadata, and hence extend the HTTP data model. Sticking with just the HTTP data model, this would take the form of: In headers: - a means of adding a piece of metadata - a means of modifying a piece of metadata - a means of deleting a piece of metadata - a means of retrieving a piece of metadata - a means of querying for metadata These capabilities largely map to Yaron's recent proposal. As resources: - a means of creating a link to another resource - a means of creating, modifying, deleting, and retrieving a whole resource These capabilities map to the existing HTTP/1.1 specification. This also agrees with Ron Daniel: >Assuming we agree that some metadata is handled by headers and some by >separate resources, we can now discuss ways of editing it. For metadata >held as resources, the GET, DELETE, and PUT (or POST) methods should >suffice. For the smaller info carried as headers we may need methods such >as you describe. I think the essential functionality of METAGET is already >handled by the HEAD method. Something like METAPOST and METADELETE seems >necessary. Links. Ron Daniel writes: >I think that redefining existing parts of the HTTP spec, such as LINKs, >is beyond the bounds of this WG. Furthermore, annotations can be >handled under the existing constraints on LINKs. I disagree in the case of LINKs, since they are underspecified in the current HTTP specification, are defined in an Appendix, and do not meet the needs of annotation services (since the source is implicitly the resource on which the link is defined). I think we should accept as a goal to be as compatibile as possible with HTTP/1.1 links, but meeting our requirements should come first. However, we do have an open issue. Are links simply metadata, or are they a special class of metadata? If there is a generic way to define and delete HTTP headers (e.g. METAPOST and METADELETE), and a link is simply an HTTP header called "Link", then do there need to be LINK and UNLINK methods at all? If links have semantics which differentiate them from other headers (e.g., LINKSEARCH) then it makes sense to have LINK and UNLINK headers, and even separate retrieval functions (such as in <draft-jensen-webdav-ext-00>). If they are simply headers, then LINK and UNLINK have no place (and HTTP should be renamed the Object-Oriented Transfer Protocol :-) - Jim
Received on Wednesday, 19 March 1997 04:57:09 UTC