Re: Agent-mediated access, kidcode critiques, and community standards from Ted Hardie on 1995-06-20 (www-talk@w3.org from May to June 1995)

From: Ted Hardie <hardie@merlot.arc.nasa.gov>
Date: Tue, 20 Jun 1995 16:04:34 -0700 (PDT)
To: m.koster@nexor.co.uk (Martijn Koster)
Cc: brian@organic.com, peterd@bunyip.com, rating@junction.net, www-talk@www10.w3.org, uri@bunyip.com
Message-Id: <199506202304.QAA09569@merlot.arc.nasa.gov>
Martijn, discussing the difference between robot exclusion and
labeling notes:

> The labeling we are discussing is quite different. There are many
> client software authors, with a long time to market, and a desire not
> to distribute hacks (with a few exceptions :-) as old software is used
> for ages. There are many client visits to many servers, so that the
> /audience.txt retrievals would be considerably more noticeable. When
> it comes to labeling content to the granularity proposed by Kidcode,
> we are no longer talking about a few areas or a few URL's per server,
> and may quickly get scaling problems.
> 
> So I would advise against proposing an /audience.txt as an interrim
> solution.

> My suggestion of using a KidCode HTTP header didn't provoke much
> response, while I think it has some advantages: the user gets the
> choice, it scales, it can be added to exisiting code easily, scales,
> doesn't require a third party infrastructure, and will be quite easy
> to establish as a standard since it is a simple extension to http. It
> can also easily coexist with community schemes.
>
> I'd appreciate some feedback: is the lack of support for protocols
> other than HTTP perceived to be a big problem? Will UR[CA]
> infrastructure takes as much time to deploy as adding a header to
> existing code? Is the rush for an interrim solution justified? Is an
> HTTP header a good idea?
> __________
> Internet: m.koster@nexor.co.uk
> X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
> WWW: http://web.nexor.co.uk/mak/mak.html

	I certainly agree that the labeling we are discussing is quite
different from setting up a robot exclusion standard, and that the
hits against an /audience.txt would be extensive.  Several things
could be done at the browser level to minimize the impact (by checking
headers for changes before retrieving the text of /audience.txt, for
example).  No matter what is done to speed things, though, there is no
doubt that adding this extra step would slow browsers, since they
would need to do a retrieval and parse the text before getting actual
data.  Presumably, those using browsers to screen based on
audience.txt would be willing to put up with the extra time.

	I'm not sure what actual impact audience.txt retrievals would
have on server performance or network load.  The robots.txt files I've
seen are fairly small; no doubt the audience.txt would be larger, but
probably smaller than a single button gif.  If, like /robots.txt,
there is only one audience.txt per site (and a browser downloads it
only once per session, caching the result), I don't see a real problem
with network traffic or server load.  This does raise, however, the
scaling problem; for a large site with many different information
collections, maintaining an accurate /audience.txt may be difficult
(Martijn will no doubt remember my comments to that effect about
robots.txt in the context of harvesting NASA; luckily, he is too much
the gentleman to use them against me).  Given that difficulty, the
problems using a file-based solution presents for those using
database-driven servers, and the problems Martijn points out with
time-to-market for browser authors, it may be best to avoid the
audience.txt as an interim solution; I do feel it would work better
than a URL-based system, but I agree that it is inferior to the longer
term solutions.  The only thing I really see to recommend it is that
it is (relatively) quick to implement.

	As for the idea of an HTTP header, I proposed a Restrictions:
header to the http working group some time ago, to deal with
situations where a browser needed to know what restrictions on access
were placed on specific materials (this was in the context of indexing
collections, and meant to enable a browser/gatherer to determine
whether a specific item was available to all users before indexing it).
I got a lot of feedback at the time, much of it negative.  To summarize some
of that feedback:

	1) Some felt that a variation of the Accept: header would be better,
so that browsers put forward what they were willing to see, rather than
servers describing what there was to see and leaving it up to the browser
to then junk data that passed across and was not okay (To avoid this,
the browser would have to ask for the header, then request the document
or parts which were okay.)

	2) Some felt that descriptions of the document belonged in the
keywords field, and that access-restrictions were essentially descriptive.
Disagreement with this centered around the idea that browsers without access
were given access information by the server, and that using keywords would
mean using two different methods to deliver the same information.

	3) Many, many of those who replied saw the proposal as
inviting censorship and discouraged me from suggesting a Restrictions:
header on those grounds.

	Near the end of the discussion, a suggestion was made to adapt
the pragma method (currently only used by browsers when they wish to
insist that the original document be retrieved, even when in-line
proxies have copies available) for this purpose.  Given the press of
other work, I have not followed up on this suggestion at all.

			Regards,
				Ted Hardie
				NASA NAIC
Received on Tuesday, 20 June 1995 19:01:41 UTC