Page v. document (search and agent issue)

Nick Arnett (
Fri, 22 Nov 1996 08:15:43 -0800

Message-Id: <>
Date: Fri, 22 Nov 1996 08:15:43 -0800
From: Nick Arnett <>
Subject: Page v. document (search and agent issue)

Some of our customers have observed a problem that calls for a solution in
HTML.  There may be something in the proposals for this, but I can't quite
see what applies.  The problem comes up with there are multiple documents on
a single HTML page.  Although I don't see this sort of thing often, it's
apparently quite common on some news-related pages.  A page might have 10
different news articles on it.  The problem is that when one of those
articles changes, an agent watching that page will see that it has changed
and notify the user, even though it wasn't a change to an article that the
user was interested in (the agent may be matching on an article that hadn't
changed). The result is that users are being notified repeatedly that there
is new information that matches their interests, incorrectly.

Thus, there's a need to be able to define search and retrieval units that
are subsets of the HTML page, which I would think is a job for HTML, even
though it could be done with some sort of external markup.  Is there
anything happening along those lines?  Named anchors are related, since they
can give you a pointer to an article contained in a page, but they haven't
been intended to mark the start and end of a search and retrieval unit.

There's also a need for defining multi-part documents (the inverse problem
-- a document that is made up of a number of HTML pages).  The distributed
search and retrieval workshop last spring came up with a proposal for that,
using LINK tags and such.

Nick Arnett

Product Manager, Advanced Technology
Verity Inc.
408-542-2164; home office 408-369-1233