- From: Dan Connolly <connolly@w3.org>
- Date: Wed, 08 Jun 2005 11:17:21 -0500
- To: www-tag@w3.org
So a few days ago, this crossed my desktop from umpteen sources... "Google Sitemaps is an experiment in web crawling. Using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and improve the time to inclusion in our index." -- https://www.google.com/webmasters/sitemaps/docs/en/about.html It's clearly relevant to issue siteData-36 http://www.w3.org/2001/tag/issues.html?type=1#siteData-36 It seems very ironic, to me; W3C held a workshop a while ago... Distributed Indexing/Searching Workshop May 28-19, 1996 in Cambridge, Massachusetts http://www.w3.org/Search/9605-Indexing-Workshop/ Going into that workshop, my sense was that we needed a simple format for sites to summarize their contents so that search engines wouldn't have to crawl the whole thing to figure out what's there. There was a whole session on this idea... "The third breakout/writeup session focused on mechanisms to allow information servers to notify indexers when content changes." -- http://www.w3.org/Search/9605-Indexing-Workshop/ExecSummary.html What I learned at the workshop was: search engines don't care what you think is interesting about your site; they have their own idea about what's interesting, mostly based on links from other parts of the web. They don't crawl your whole site just because it's there; they focus on pages that have lots of incoming links and such. So now to see google making use of a sitemap format 10 years later kinda blows my mind. Note that RSS once stood for Rich Site Summary... interesting... according to Robin Cover, it still does: http://www.oasis-open.org/cover/rss.html I wonder if google considered something RDF-based like RSS and decided against it or if they just didn't think about it. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Wednesday, 8 June 2005 16:17:30 UTC