- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Tue, 23 Apr 2002 23:20:55 -0700
- To: "Mark Nottingham" <mnot@mnot.net>
- Cc: <www-tag@w3.org>
> Interesting. My experience is completely different, and I wouldn't refer > to that as an arcane bug at all. Oh, it was arcane allright. It involved a combination of pages with cache-control-private set and JPEGs in those pages having no cache hints, and the images' (lack of) cache hints being erroneously applied to the pages. > > And any crawlers I have used are deliberately designed to ignore URIs > > with querystrings. > > See Paul's reference re: Google. I'd seen the same behaviour, but didn't > have an example so handy. (Thanks, Paul!) I wasn't claiming that crawlers *can't* crawl querystrings, but any crawlers I have used require you to deliberately turn this on or specify in a filter which querystrings are "safe". I run a crawler internally at Microsoft which crawls pages with querystrings, in fact. But I deliberately configured it to do so, and only with pages that I know to be "safe". I could show you search results that index URLs with querystrings, but that certainly doesn't mean that I consider *all* URLs with querystrings to be "safe" to GET. There is no way to guarantee that all URLs will be free of GET side-effects, and it would be misleading to tell people that such a guarantee exists. (I also would be shocked to hear that Google's implementation is blindly crawling querystrings with no heuristics to determine safeness. Without input from the Google guys, I guess we just have to speculate.)
Received on Wednesday, 24 April 2002 02:22:02 UTC