- From: Ted Hardie <hardie@merlot.arc.nasa.gov>
- Date: Tue, 20 Jun 1995 16:04:34 -0700 (PDT)
- To: m.koster@nexor.co.uk (Martijn Koster)
- Cc: brian@organic.com, peterd@bunyip.com, rating@junction.net, www-talk@www10.w3.org, uri@bunyip.com
Martijn, discussing the difference between robot exclusion and labeling notes: > The labeling we are discussing is quite different. There are many > client software authors, with a long time to market, and a desire not > to distribute hacks (with a few exceptions :-) as old software is used > for ages. There are many client visits to many servers, so that the > /audience.txt retrievals would be considerably more noticeable. When > it comes to labeling content to the granularity proposed by Kidcode, > we are no longer talking about a few areas or a few URL's per server, > and may quickly get scaling problems. > > So I would advise against proposing an /audience.txt as an interrim > solution. > My suggestion of using a KidCode HTTP header didn't provoke much > response, while I think it has some advantages: the user gets the > choice, it scales, it can be added to exisiting code easily, scales, > doesn't require a third party infrastructure, and will be quite easy > to establish as a standard since it is a simple extension to http. It > can also easily coexist with community schemes. > > I'd appreciate some feedback: is the lack of support for protocols > other than HTTP perceived to be a big problem? Will UR[CA] > infrastructure takes as much time to deploy as adding a header to > existing code? Is the rush for an interrim solution justified? Is an > HTTP header a good idea? > __________ > Internet: m.koster@nexor.co.uk > X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M > WWW: http://web.nexor.co.uk/mak/mak.html I certainly agree that the labeling we are discussing is quite different from setting up a robot exclusion standard, and that the hits against an /audience.txt would be extensive. Several things could be done at the browser level to minimize the impact (by checking headers for changes before retrieving the text of /audience.txt, for example). No matter what is done to speed things, though, there is no doubt that adding this extra step would slow browsers, since they would need to do a retrieval and parse the text before getting actual data. Presumably, those using browsers to screen based on audience.txt would be willing to put up with the extra time. I'm not sure what actual impact audience.txt retrievals would have on server performance or network load. The robots.txt files I've seen are fairly small; no doubt the audience.txt would be larger, but probably smaller than a single button gif. If, like /robots.txt, there is only one audience.txt per site (and a browser downloads it only once per session, caching the result), I don't see a real problem with network traffic or server load. This does raise, however, the scaling problem; for a large site with many different information collections, maintaining an accurate /audience.txt may be difficult (Martijn will no doubt remember my comments to that effect about robots.txt in the context of harvesting NASA; luckily, he is too much the gentleman to use them against me). Given that difficulty, the problems using a file-based solution presents for those using database-driven servers, and the problems Martijn points out with time-to-market for browser authors, it may be best to avoid the audience.txt as an interim solution; I do feel it would work better than a URL-based system, but I agree that it is inferior to the longer term solutions. The only thing I really see to recommend it is that it is (relatively) quick to implement. As for the idea of an HTTP header, I proposed a Restrictions: header to the http working group some time ago, to deal with situations where a browser needed to know what restrictions on access were placed on specific materials (this was in the context of indexing collections, and meant to enable a browser/gatherer to determine whether a specific item was available to all users before indexing it). I got a lot of feedback at the time, much of it negative. To summarize some of that feedback: 1) Some felt that a variation of the Accept: header would be better, so that browsers put forward what they were willing to see, rather than servers describing what there was to see and leaving it up to the browser to then junk data that passed across and was not okay (To avoid this, the browser would have to ask for the header, then request the document or parts which were okay.) 2) Some felt that descriptions of the document belonged in the keywords field, and that access-restrictions were essentially descriptive. Disagreement with this centered around the idea that browsers without access were given access information by the server, and that using keywords would mean using two different methods to deliver the same information. 3) Many, many of those who replied saw the proposal as inviting censorship and discouraged me from suggesting a Restrictions: header on those grounds. Near the end of the discussion, a suggestion was made to adapt the pragma method (currently only used by browsers when they wish to insist that the original document be retrieved, even when in-line proxies have copies available) for this purpose. Given the press of other work, I have not followed up on this suggestion at all. Regards, Ted Hardie NASA NAIC
Received on Tuesday, 20 June 1995 19:01:41 UTC