- From: Jacob Palme <jpalme@dsv.su.se>
- Date: Sat, 3 Mar 2001 22:29:36 +0100
- To: discuss@apps.ietf.org
At 08.56 -0800 01-03-03, Mark Nottingham wrote: >There are ways to assure cache freshness. See > http://www.mnot.net/cache_docs/ >If that isn't good enough, vary the reference's query string, most >search engines will understand. After experimentation, we found the best success with adding ";12345678, where 12345678 is a number which changes every time the document is modified, at the end of those URLs which refer to pages which are frequently updated. You can look at for example http://salut.nu/forum/uno/2/ to see how our pages look like. If you click on a link in those pages, ";12345678" with a new number is added at the end of most of the pages whenever the page is changed, in order to avoid users getting old, cached versions of the document they ask for. But the document you referenced might teach us a method of avoiding stale pages to user without adding ;12345678 at the end of our URLs. It seems like a very comprehensive tutorial on how to get your documents to be handled correctly by caches. >Also, you might try using robots.txt to shape which documents will be >fetched. At present we are also using robots.txt to suppress search engines. Some of our content might however be suitable for search engines, and then we must ensure that we do not send the ";12345" URL-s to them, because this would cause them to store multiple references to almost the same document (something which already is a big problem for search engine implementors), and which all would in reality refer users to the same document. I can guess that a standard method of recognizing when a HTTP client is a search engine might be controversial, since spammers would of course be very happy to be able to deliver different content to search engines than to ordinary users in order to pull users to their sites. But they probably already have learned to do that using various heuristics, as you suggest. -- Jacob Palme <jpalme@dsv.su.se> (Stockholm University and KTH) for more info see URL: http://www.dsv.su.se/jpalme/
Received on Saturday, 3 March 2001 16:40:22 UTC