- From: <jon@hackcraft.net>
- Date: Wed, 12 Nov 2003 10:45:37 +0000
- To: "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
> > I suspect it doesn't show up that often because not that many people > > I suspect the reasons include: > > - most ? references are still implied through forms; > - many sites have their CGI areas blocked in robots.txt; > - most sites using non-trivial ? references in href attributes have broken > URLs because they don't escape & properly or use the alternative of ; > suggested in an HTML specification appendix; > - there may be a tendency to use meta to inhibit caching on such pages. > > I consider this particular, Cold Fusion, technique as an abuse of URLs > which, by confusing the mechanics of creating the HTML with the naming > of the resource, causes misoperation of things like proxies (Squid's > default rules are to send ? URLs direct to the origin, rather than to > an upstream cache, as it expects not to get a cachable resource back). > I consider that proxy behaviour, if not an abuse of HTTP then a failure to implement it as well as is possible. Admittedly it's probably a failure based on adapting to practical experience with sites which in turn fail to implement HTTP as well as is possible. A URI containing a query string is of equal status to any other URI, though it may be weaker in terms of human-readable qualities. A GET to such a URI is just as capable of returning a cachable resource and developers should strive to assist this caching (setting Last-Modified, reacting appropriately to If- Modified-Since). I've used query-strings on numerous occasions (sometimes, in the case of searches, this was the most sensible way to go; sometimes it was due to the relative difficulty in generating data-driven sites any other way with certain tools). While I do avoid the use of query-strings I have not found their use to get in the way of caching. In particular there are some cases where the data-driven nature, combined with a knowledge of the mechanics producing that data, offers a reliable way of determining expiry dates, with a tremendous gain to caching efficiency. As for search engines, the reason that URIs with query strings are less likely to get indexed is that search engine people don't want their spiders to spend eternity indexing a site that is produced on the fly for which there may be an infinite number of URIs to be found in the generated pages. As an example of such a page I once wrote a joke version of RSS as a satire of the version numbers used by rival versions, each page would claim to have been obsoleted by another version which it linked to, with version 10234.0 pointing to version 10235.0 and so on. This would go on until version 2147483647.0 after which it would trigger an overflow error I couldn't be bothered checking for on what was after all a joke. If google had started indexing that page it would still be there :) Of course this can also happen with URIs that don't contain query strings, but such cases are rarer. Google will generally list a page if it is linked to from a page which doesn't contain a query string in the URI (I'd guess that includes HTTP redirects from such a page, but I'm not sure), having few parameters helps as well. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*
Received on Wednesday, 12 November 2003 05:45:38 UTC