Re: Page v. document (search and agent issue)

Sunil Mishra (smishra@cc.gatech.edu)
Fri, 22 Nov 1996 15:51:23 -0500


Message-Id: <199611222051.PAA03035@cleon.cc.gatech.edu>
To: Sunil Mishra <smishra@cc.gatech.edu>
cc: Nick Arnett <narnett@verity.com>, www-html@w3.org
Subject: Re: Page v. document (search and agent issue) 
In-reply-to: Your message of "Fri, 22 Nov 1996 15:39:52 EST."
             <199611222039.PAA02795@cleon.cc.gatech.edu> 
Date: Fri, 22 Nov 1996 15:51:23 -0500
From: Sunil Mishra <smishra@cc.gatech.edu>

> Return-Path: www-html-request@w3.org
> Received: from burdell.cc.gatech.edu (root@burdell [130.207.3.207]) by cleon.cc.gatech.edu (8.8.3/8.6.9) with ESMTP id PAA02974 for <smishra@cleon.cc.gatech.edu>; Fri, 22 Nov 1996 15:48:47 -0500 (EST)
> Received: from www19.w3.org (www19.w3.org [18.52.0.17]) by burdell.cc.gatech.edu (8.8.3/8.6.9) with SMTP id PAA05258; Fri, 22 Nov 1996 15:48:34 -0500 (EST)
> Received: by www19.w3.org (8.6.12/8.6.12) id PAA09298; Fri, 22 Nov 1996 15:41:33 -0500
> Resent-Date: Fri, 22 Nov 1996 15:41:33 -0500
> Resent-Message-Id: <199611222041.PAA09298@www19.w3.org>
> Message-Id: <199611222039.PAA02795@cleon.cc.gatech.edu>
> To: Nick Arnett <narnett@verity.com>
> cc: www-html@w3.org
> Subject: Re: Page v. document (search and agent issue) 
> In-reply-to: Your message of "Fri, 22 Nov 1996 08:15:43 PST."
>              <2.2.32.19961122161543.009b3858@207.135.74.230> 
> Date: Fri, 22 Nov 1996 15:39:52 -0500
> From: Sunil Mishra <smishra@cc.gatech.edu>
> X-List-URL: http://www.w3.org/pub/WWW/MarkUp/Forums#www-html
> X-See-Also: http://www.w3.org/pub/WWW/MarkUp/
> Resent-From: www-html@w3.org
> X-Mailing-List: <www-html@w3.org> archive/latest/6933
> X-Loop: www-html@w3.org
> Sender: www-html-request@w3.org
> Resent-Sender: www-html-request@w3.org
> Precedence: list
> 
> No, HTML is not geared towards a hieararchical document definition, which
> is essentially what you seem to be looking for. The closest you might be
> able to get is to specify each article within it's own
> <div>. Unfortunately, the ID attribute has disappeared from HTML 3.2, which
> is exactly what you would be looking for if you wanted to specify a
> specific subpart of the HTML. The agent would of course also have to be
> modified to react to changes within specific <div>'s rather than a change
> anywhere within the document. A poor alternative to id would be to
> <a name...> the headline at the top of the <div>.
> 
> HTML 3.2 does specifies a class attribute. I would generally consider it a
> very bad hack to use class to specify different stories. But then you would
> not be the first to hack up HTML.

Erratum: No class attribute in HTML 3.2 :-(

CLASS and ID should be present in the next version of HTML though.

> Sunil
> 
> > Some of our customers have observed a problem that calls for a solution in
> > HTML.  There may be something in the proposals for this, but I can't quite
> > see what applies.  The problem comes up with there are multiple documents on
> > a single HTML page.  Although I don't see this sort of thing often, it's
> > apparently quite common on some news-related pages.  A page might have 10
> > different news articles on it.  The problem is that when one of those
> > articles changes, an agent watching that page will see that it has changed
> > and notify the user, even though it wasn't a change to an article that the
> > user was interested in (the agent may be matching on an article that hadn't
> > changed). The result is that users are being notified repeatedly that there
> > is new information that matches their interests, incorrectly.
> > 
> > Thus, there's a need to be able to define search and retrieval units that
> > are subsets of the HTML page, which I would think is a job for HTML, even
> > though it could be done with some sort of external markup.  Is there
> > anything happening along those lines?  Named anchors are related, since they
> > can give you a pointer to an article contained in a page, but they haven't
> > been intended to mark the start and end of a search and retrieval unit.
> > 
> > There's also a need for defining multi-part documents (the inverse problem
> > -- a document that is made up of a number of HTML pages).  The distributed
> > search and retrieval workshop last spring came up with a proposal for that,
> > using LINK tags and such.
> > 
> > Nick Arnett
> > 
> > ---------------------------------------
> > Evangelist
> > Product Manager, Advanced Technology
> > Verity Inc.
> > 408-542-2164; home office 408-369-1233
> > http://www.verity.com
> > 
>