Re: Indexing non-HTML objects from Andrew Daviel on 1997-05-03 (ietf-http-wg@w3.org from April to June 1997)

From: Andrew Daviel <andrew@andrew.triumf.ca>
Date: Fri, 2 May 1997 18:15:19 -0700 (PDT)
To: http-wg@cuckoo.hpl.hp.com
Cc: Robots List <robots@mail.mccmedia.com>
Message-Id: <Pine.LNX.3.91.970502174703.24647F-100000@andrew.triumf.ca>

On Fri, 2 May 1997, David W. Morris wrote:

> Yes, but as has been already pointed out, the LINK is a subpart of
> the HTML and thus doesn't provide for describing arbtrary www content,
> in the case of this thread, for purposes of representing the arbitrary
> www content in a suitable fashion for indexing.

The idea is that the HTML document includes the metadata (as META tags, 
PICS label headers, or just plain HTML). The LINK references the resource 
(PDF file, MPEG, or perhaps gopher or  telnet port). It can
point to arbitrary content. An indexing agent uses the metadata to index 
the resource, so when I go to a search engine and click on "68040 
dataheet" I get the PDF document, not a document pointing to the PDF 
document. Link in HTTP is, as I understand it, the same conceptual 
mechanism and so could be used to provide a forward relationship from the 
resource back to the metadata, or where a metadata standard defines a 
plain text file (as FGDC, I think), provide the reverse relationship from 
metadata to resource. In most cases HTML <LINK> would be used as it is 
easier for authors to use, at least on existing older servers.

(with a new header, I guess <META HTTP-EQUIV could be used ...)

> Transparent Content Negotiation would provide the ideal infrastructure via
> which the URL/URN/URI identified resource listed in the proposed metainfo
> header could have the appropriate variant delivered based on the 
> specific needs of a particular indexing service. Then some content
> could have multiple descriptive documents for indexing purposes if the
> publisher so chose.

Mm, sounds exciting if people will use it. I suspect not ..
Any idea how many people are using content-negotiation at this point ?
I've been waiting for HTTP/1.1 to address the cacheing issues (which it 
has), but I don't really have much negotiable content anyway and haven't 
updated yet.

I would anticipate that people would submit the metadata HTML, or an
index of metadata, so that robots would originally discover the metadata 
rather than the resource. Different HTML metadata would be identified by 
schema while plaintext metadata may be structured in some other way 
interally to allow identification.

Andrew Daviel

Received on Friday, 2 May 1997 18:19:22 UTC