HTML Metadata for other Web objects

We need a consensus on how to specify metadata for non-HTML Web objects, 
IMO.

The "search engine" META tags are in widespread use, and most people 
know how to use them, viz.
<head>
<meta name="description" content="object description here"> 
<meta name="keywords" content="various keywords here">
</html>

The Dublin Core set (http://purl.org/metadata/dublin_core_elements) 
defines a set of standard media-independant elements which may
be applied to HTML documents, and the simple elements "Author" and 
"Copyright" are also in common use.

These all apply to the document that they are embedded in.

There is a need to index non-HTML objects which have no
well-defined metadata facility, and also non-text 
objects. These  include audio and video files, PostScript or PDF 
documents, executable binaries, and data sets.

It might be possible to include metadata in HTTP headers, but people seem 
reluctant to bulk the headers doing this, and much of the linking ability 
is lost.

http://www.ics.uci.edu/pub/ietf/html/draft-ietf-html-relrev-00.txt
suggests a META relationship for the LINK tag, as in 
<LINK REL=META HREF="metadata.html"> for a forward relationship and
<LINK REV=META HREF="blossom.gif"> for a reverse relationship.

The Dublin Core element IDENTIFIER may also be used, perhaps in the form
<meta name="dc.identifier" scheme=URL 
 content="http://some.org/blossom.gif">, though it may be used for other 
purposes, such as ISBN, ISSN, UPC etc. and has no concept of relative URLs


I propose:

  That the reverse META relationship in an HTML header be used to indicate
  that metainformation in the current document
  applies to the referenced object, not the current document itself.
 

  That the forward META relationship in HTTP (RFC 2068-19.6.2.4) be 
  optionally used to indicate the existance of metainformation 
  pertaining to a non-text object.

  That the reverse META relationship in HTTP be optionally used 
  in an identical fashion to the reverse META relationship in HTML.

Example - the metadata:

  HTTP/1.0 200 OK
  Content-type: text/html
  Link: <http://www.some.org/blossom.gif>; rev="meta"

  <html><head><title>Apple Blossoms</title>
  <LINK REV=META HREF="blossom.gif">
  <meta name="dc.title" content="Apple Blossoms">
  <meta name="dc.description" content="Apple Blossoms in Springtime">
  <meta name="dc.creator" content="Ann Photog">
  <meta name="dc.identifier" scheme="url" 
    content="http://www.some.org/blossom.gif">
  <meta name="copyright" content="Ann Photog, 1997">
  </head><body>
  (<a href="blossom.gif">here's the photo</a>)
  </body></html>

and the object:

  HTTP/1.0 200 OK
  Content-type: image/gif
  Link: <http://www.some.org/blossom.html>; rel="meta"

  GIF89a etc. etc.


This would require that discovery agents such as Web robots and spiders 
associate the metainformation with the correct object, listing e.g.

  <a href="http://www.some.org/blossom.gif">Apple Blossoms</a>
  &quot;Apple Blossoms in Springtime&quot;; Creator: Ann Photog

rather than

  <a href="http://www.some.org/blossom.html">Apple Blossoms</a>

and optionally follow the HTTP Link header to discover metainformation
when traversing the original object (blossom.gif).

Perhaps an empty HREF field 
  <LINK REV=META HREF=""> could be used to signify that the metadata 
pertains to an offline object, such as a book or statue  (which may 
have a valid non-URL DC.IDENTIFIER element) ...



Andrew Daviel 
TRIUMF & Vancouver Webpages

Received on Wednesday, 14 May 1997 15:50:27 UTC