Re: Indexing non-HTML objects from Benjamin Franz on 1997-05-02 (ietf-http-wg@w3.org from April to June 1997)

From: Benjamin Franz <snowhare@netimages.com>
Date: Thu, 1 May 1997 18:31:05 -0700 (PDT)
To: Robots List <robots@mail.mccmedia.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <Pine.LNX.3.95.970501181845.21948C-100000@ns.viet.net>
(forwarded without much snippage so the HTTP-WG can see the original
proposal - my comments are at the bottom -- Benjamin F.)

On Thu, 1 May 1997, Andrew Daviel, <advax@triumf.ca> wrote:

> 
> There is a need to index non-HTML objects which have no
> well-defined metadata facility, and also non-text 
> objects. These  include audio and video files, PostScript or PDF 
> documents, executable binaries, and data sets.
> 
> 
> http://www.ics.uci.edu/pub/ietf/html/draft-ietf-html-relrev-00.txt
> suggests a META relationship for the LINK tag, as in 
> 
> <LINK REL=META HREF="metadata.html"> for a forward relationship and
> <LINK REV=META HREF="blossom.gif"> for a reverse relationship.
> 
> 
> Proposal:
> 
>   That the reverse META relationship in an HTML header be used to indicate
>   that metainformation in the current document
>   applies to the referenced object, not the current document itself.
>  
> 
>   That the forward META relationship in HTTP (RFC 2068-19.6.2.4) be 
>   optionally used to indicate the existance of metainformation 
>   pertaining to a non-text object.
> 
>   That the reverse META relationship in HTTP be optionally used 
>   in an identical fashion to the reverse META relationship in HTML.
> 
> Example - the metadata:
> 
>   Content-type: text/html
>   Link: <http://www.some.org/blossom.gif>; rev="meta"
> 
>   <html><head><title>Apple Blossoms</title>
>   <LINK REV=META HREF="blossom.gif">
>   <meta name="description" content="Apple Blossoms in Springtime">
>   </head><body>
>   (<a href="blossom.gif">here's the photo</a>)
> 
> and the object:
> 
>   Content-type: image/gif
>   Link: <http://www.some.org/blossom.html>; rel="meta"
> 
>  
> 
> This would require that discovery agents such as Web robots and spiders 
> associate the metainformation with the correct object, listing e.g.
> 
>   <a href="http://www.some.org/blossom.gif">Apple Blossoms</a>
>   &quot;Apple Blossoms in Springtime&quot;; 
> 
> rather than
> 
>   <a href="http://www..some.org/blossom.html">Apple Blossoms</a>
> 
> and optionally follow the HTTP Link header to discover metainformation
> when traversing the original object (blossom.gif).
> 
> 
> The Dublin Core element IDENTIFIER may also be used, viz.
> <meta name="dc.identifier" scheme=URL 
>  content="http://some.org/blossom.gif">
> (see http://purl.org/metadata/dublin_core_elements)
> 
> Perhaps an empty HREF field 
>   <LINK REV=META HREF=""> could be used to signify that the metadata 
> pertains to an offline object, such as a book or statue  (which may 
> have a valid non-URL DC.IDENTIFIER element) ...


http://www.ics.uci.edu/pub/ietf/html/draft-ietf-html-relrev-00.txt expired
almost a year ago. It is no longer an active proposal unless a new draft
has been issued somewhere and I didn't notice. 

Regardless - one problem is that your proposal only allows *ONE* object to
be associated with the meta data in a document and so prevents the
document from containing meta information for *itself* (or for multiple
included objects). So now you need *another* document with links to the
meta document so you can have a document with non-HTML objects with
associated meta data.  Additionally, there is the 'multiple/hostile
meta-document' problem - how do you resolve multiple meta definitions for
a single object and/or prevent someone *else* from assigning undesired
meta information to one of *your* objects? 

You could resolve the single referenced object problem with a different
choice of how to encode the meta info, but the multiple/hostile metadata
problem is a nasty problem that is not easy to solve in general with
forward references to meta data from a second document. 

It would probably be better to try for a new HTTP header instead. Then a
server could send something like:

Metainfo-Location: URL/URI

in the HTTP headers for an object. This would prevent multiple/hostile
meta problems and could be handled by existing servers like Apache and
CERN without any code changes at all since they can attach arbitrary
headers to transmitted objects no problem and would not excessively
loadup the HTTP headers with potentially large amounts of meta
information. 

-- 
Benjamin Franz
Received on Thursday, 1 May 1997 18:32:21 UTC