Re: Indexing non-HTML objects from Andrew Daviel on 1997-05-07 (ietf-http-wg@w3.org from April to June 1997)

From: Andrew Daviel <andrew@andrew.triumf.ca>
Date: Wed, 7 May 1997 13:53:58 -0700 (PDT)
To: "David W. Morris" <dwm@xpasc.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <Pine.LNX.3.91.970507134143.612C-100000@andrew.triumf.ca>

On Tue, 6 May 1997, David W. Morris wrote:

> The problem with meta data is that it is intrinsicly limiting where as the
> approach I"ve been advocating of a complete surrogate document correlated
> with the otherwise unindexable resource provides a much richer descriptive
> space.

OK. How do you  correlate it with the unindexable resource ?
Using HTTP or HTML Link doesn't require a particular metadata structure
so would work just fine with what you suggest. Using an HTML Anchor HREF
would merely indicate that there is another document there, so you'd get 
the URL of the surrogate document in a search, not the real document. Not 
a big problem for an academic paper, but for active content you're going 
to get:

 refresh 3;url=http://active-page
 blah blah blah

 Please <a href="etc">Click to get our way-cool active page</a> or wait 3 
seconds

Or, of course, you make things content-negotiable on user-agent and look
for robots (or Netscape/MSIE) and point them to the right place. I'm
in favour of cacheable content, though.

> In my experience, most content authors do a poor job of coding a good
> representation of their content with a few words. Software which considers
> a full text description seems to be more generally effective.

Maybe. I often don't bother to look if the title and description (from 
metadata) don't appear relevant. I'll jut try the next 10 matches.

Andrew Daviel

Received on Wednesday, 7 May 1997 14:00:02 UTC