Re: Page v. document (search and agent issue)

Sunil Mishra (smishra@cc.gatech.edu)
Fri, 22 Nov 1996 15:39:52 -0500


Message-Id: <199611222039.PAA02795@cleon.cc.gatech.edu>
To: Nick Arnett <narnett@verity.com>
cc: www-html@w3.org
Subject: Re: Page v. document (search and agent issue) 
In-reply-to: Your message of "Fri, 22 Nov 1996 08:15:43 PST."
             <2.2.32.19961122161543.009b3858@207.135.74.230> 
Date: Fri, 22 Nov 1996 15:39:52 -0500
From: Sunil Mishra <smishra@cc.gatech.edu>

No, HTML is not geared towards a hieararchical document definition, which
is essentially what you seem to be looking for. The closest you might be
able to get is to specify each article within it's own
<div>. Unfortunately, the ID attribute has disappeared from HTML 3.2, which
is exactly what you would be looking for if you wanted to specify a
specific subpart of the HTML. The agent would of course also have to be
modified to react to changes within specific <div>'s rather than a change
anywhere within the document. A poor alternative to id would be to
<a name...> the headline at the top of the <div>.

HTML 3.2 does specifies a class attribute. I would generally consider it a
very bad hack to use class to specify different stories. But then you would
not be the first to hack up HTML.

Sunil

> Some of our customers have observed a problem that calls for a solution in
> HTML.  There may be something in the proposals for this, but I can't quite
> see what applies.  The problem comes up with there are multiple documents on
> a single HTML page.  Although I don't see this sort of thing often, it's
> apparently quite common on some news-related pages.  A page might have 10
> different news articles on it.  The problem is that when one of those
> articles changes, an agent watching that page will see that it has changed
> and notify the user, even though it wasn't a change to an article that the
> user was interested in (the agent may be matching on an article that hadn't
> changed). The result is that users are being notified repeatedly that there
> is new information that matches their interests, incorrectly.
> 
> Thus, there's a need to be able to define search and retrieval units that
> are subsets of the HTML page, which I would think is a job for HTML, even
> though it could be done with some sort of external markup.  Is there
> anything happening along those lines?  Named anchors are related, since they
> can give you a pointer to an article contained in a page, but they haven't
> been intended to mark the start and end of a search and retrieval unit.
> 
> There's also a need for defining multi-part documents (the inverse problem
> -- a document that is made up of a number of HTML pages).  The distributed
> search and retrieval workshop last spring came up with a proposal for that,
> using LINK tags and such.
> 
> Nick Arnett
> 
> ---------------------------------------
> Evangelist
> Product Manager, Advanced Technology
> Verity Inc.
> 408-542-2164; home office 408-369-1233
> http://www.verity.com
>