Re: Meta Data Handling

Ron Daniel wrote:
>I agree with Yaron to a limited extent. Handling *all* metadata as
>resources is inappropriate. However, it is my considered technical
>opinion that handling *all* metadata as headers is just as inappropriate
>as handling *all* metadata as separate resources. Some descriptions,
>such as Content-length, Last-modified, Content-type, ... are best
>carried as headers. Other descriptions, such as detailed revision
>histories, provenance tracking, and bibliographic descriptions, are
>best carried as separate resources.
>
>WEB-DAV needs a metadata architecture that accommodates both.

I agree.

After reviewing many Web metadata proposals, including PICS, PICS-NG,
Dublin Core, Warwick Framework, W3C position papers on a resources and
relationships metadata model, W3C position papers on adding metadata to
HTML, Murray Malone's REL/REV draft, and Web Collections, in addition to
skimming the proceedings of the first IEEE Metadata conference (yes, Ora,
we've done our homework :-), my view is there are two main varieties of
metadata:

"Small" chunk metadata:

These include metadata items such as:
  - HTTP headers
  - short attribute-value pairs
  - typed links (e.g. HTTP links)

While developing a stringent definition of "small" is most likely
impossible, since the definition is arbitrary, and seems to be based on
unstated assumptions about retrieval performance (e.g., retrieval of small
chunk metadata should be "trivially" or "unnoticeably" fast),  much
metadata has a small chunk flavor to it.

Characteristics of small chunk metadata include: fast retrieval speeds, no
need for content negotiation, no requirements on ordering, no need for
"trust" information (e.g., digital signature, author information, hash of
contents, date of creation), and relatively simple value information.

"Large" chunk metadata:

These include metadata items such as instances of:
  - PICS, PICS-NG collections
  - Warwick collections
  - MARC records
  - Dublin Core records
  - discipline-specific metadata records
  - Web pages

Like the smallness of small chunk metadata, the largeness of large chunk
metadata is similarly difficult to define (a strong indicator that small
and large are poor terms).

Characteristics of large chunk metadata include: requirements on the
ordering of fields, encoded trust information, pointers to metadata schema
descriptions, complex data models, and multiple levels of containment.
Large chunk metadata often contains several instances of small chunk
metadata.  Typically large chunk metadata is larger than small chunk
metadata, although it is easy to develop classes of both for which this
assertion does not hold.  As a result, there is an assumption that large
chunk metadata takes longer to transmit than small chunk metadata.

Mapping of metadata to Web data model

The mapping of metadata to the various data containers (resources, headers)
in the Web data model varies depending on whether the metadata is stored
on, in, or as a resource.

1. On resource. In this case, the metadata is stored with the resource, but
is not part of the resource itself.  Examples: HTTP links, HTTP headers,
PICS labels (using the PICS-Label header).  This is typically used for
small chunk metadata.  On resource metadata is typically retrievable in 1
request (a HEAD or GET).

2. Within resource.  The metadata is embedded within the resource, and is a
defined part of the description of the document type.  Examples: HTML
REL/REV, HTML META tag, various HTML metadata proposals, MS Word .DOC
documents, Web Collections (?).  Within resource metadata is retrievable in
1 request (GET).  Within resource metadata has the advantage of being
independent of access protocol, and portable (when the resource moves, it
does too).  Within resource metadata tends to be small chunk.

3. Is (whole) resource.  The metadata is itself an entire resource.  When
the metadata is an entire resource, there usually exists a relationship
(link) between the described and metadata (describing) resources.  This
bears a resemblance to entity-relationship or semantic data modeling
database models.  Examples: Web Collections, Warwick containers, Web pages.
Typically large-chunk metadata ends up as whole resource metadata.
Typically retrieval of whole resource metadata requires 2 requests (one to
get the links, one to get the metadata).

Relation to WEBDAV.

Using this model of the mapping of metadata to the Web data model, the
various WEBDAV proposals can be characterized.  In
<draft-jensen-webdav-ext-00> (the proposal discussed at the Irvine
meeting), all metadata was whole resource metadata, with the exception of
the links used to hold the relationship between the described resource and
the metadata resource. In Yaron's recent proposal, the pendulum swings to
the other end, emphasizing a model where metadata is predominantly on
resource metadata.  While some might argue that his proposal makes all
metadata on resource metadata, this is undoubtedly too strict an
interpretation.  Yaron will undoubtedly argue that whole resource metadata
is still supported since links can still be defined and followed.

However, now that we have investigated the predominantly whole resource,
and predominantly on resource solutions of mapping metadata to the Web data
model, I do feel we can agree with:

>1) Neither headers or separate resources meet all the requirements on
>   metadata in WEB-DAV, so we will need a combined solution.

Roadmap to the future:

Ron Daniel writes:
>If we can agree on an architecture that isn't all one way or the other,
>then we can advance to more meaningful arguments, like just what new headers
>(if any) we need to define, what packages (if any) we initially want to
>support, how (or whether) their elements can be used in both contexts, and
>how we can do queries on them.

Since there are two main containers for data in the HTTP data model,
headers and resources, and since we have seen arguments in favor of storing
metadata in both of these places, it makes sense to develop a mechanisms in
WEBDAV for storing metadata in both places.  Alternatively, we might
consider extending the places we can store metadata, and hence extend the
HTTP data model.

Sticking with just the HTTP data model, this would take the form of:

In headers:

 - a means of adding a piece of metadata
 - a means of modifying a piece of metadata
 - a means of deleting a piece of metadata
 - a means of retrieving a piece of metadata
 - a means of querying for metadata

These capabilities largely map to Yaron's recent proposal.

As resources:

 - a means of creating a link to another resource
 - a means of creating, modifying, deleting, and retrieving a whole resource

These capabilities map to the existing HTTP/1.1 specification.

This also agrees with Ron Daniel:
>Assuming we agree that some metadata is handled by headers and some by
>separate resources, we can now discuss ways of editing it. For metadata
>held as resources, the GET, DELETE, and PUT (or POST) methods should
>suffice. For the smaller info carried as headers we may need methods such
>as you describe. I think the essential functionality of METAGET is already
>handled by the HEAD method. Something like METAPOST and METADELETE seems
>necessary.

Links.

Ron Daniel writes:
>I think that redefining existing parts of the HTTP spec, such as LINKs,
>is beyond the bounds of this WG. Furthermore, annotations can be
>handled under the existing constraints on LINKs.

I disagree in the case of LINKs, since they are underspecified in the
current HTTP specification, are defined in an Appendix, and do not meet the
needs of annotation services (since the source is implicitly the resource
on which the link is defined).  I think we should accept as a goal to be as
compatibile as possible with HTTP/1.1 links, but meeting our requirements
should come first.

However, we do have an open issue.  Are links simply metadata, or are they
a special class of metadata?  If there is a generic way to define and
delete HTTP headers (e.g. METAPOST and METADELETE), and a link is simply an
HTTP header called "Link", then do there need to be LINK and UNLINK methods
at all?  If links have semantics which differentiate them from other
headers (e.g., LINKSEARCH) then it makes sense to have LINK and UNLINK
headers, and even separate retrieval functions (such as in
<draft-jensen-webdav-ext-00>).  If they are simply headers, then LINK and
UNLINK have no place (and HTTP should be renamed the Object-Oriented
Transfer Protocol :-)

- Jim

Received on Wednesday, 19 March 1997 04:57:09 UTC