RE: whenToUseGet-7 counter-proposal

> Interesting. My experience is completely different, and I wouldn't
refer
> to that as an arcane bug at all.

Oh, it was arcane allright.  It involved a combination of pages with
cache-control-private set and JPEGs in those pages having no cache
hints, and the images' (lack of) cache hints being erroneously applied
to the pages.

> > And any crawlers I have used are deliberately designed to ignore
URIs
> > with querystrings.
> 
> See Paul's reference re: Google. I'd seen the same behaviour, but
didn't
> have an example so handy. (Thanks, Paul!)

I wasn't claiming that crawlers *can't* crawl querystrings, but any
crawlers I have used require you to deliberately turn this on or specify
in a filter which querystrings are "safe".  I run a crawler internally
at Microsoft which crawls pages with querystrings, in fact.  But I
deliberately configured it to do so, and only with pages that I know to
be "safe".  I could show you search results that index URLs with
querystrings, but that certainly doesn't mean that I consider *all* URLs
with querystrings to be "safe" to GET.  

There is no way to guarantee that all URLs will be free of GET
side-effects, and it would be misleading to tell people that such a
guarantee exists.

(I also would be shocked to hear that Google's implementation is blindly
crawling querystrings with no heuristics to determine safeness.  Without
input from the Google guys, I guess we just have to speculate.)

Received on Wednesday, 24 April 2002 02:22:02 UTC