W3C home > Mailing lists > Public > www-tag@w3.org > June 2005

google sitemaps and some history of sitemaps [siteData-36]

From: Dan Connolly <connolly@w3.org>
Date: Wed, 08 Jun 2005 11:17:21 -0500
To: www-tag@w3.org
Message-Id: <1118247441.12287.209.camel@localhost>

So a few days ago, this crossed my desktop from umpteen sources...

"Google Sitemaps is an experiment in web crawling. Using Sitemaps to
inform and direct our crawlers, we hope to expand our coverage of the
web and improve the time to inclusion in our index."
 -- https://www.google.com/webmasters/sitemaps/docs/en/about.html

It's clearly relevant to issue siteData-36

It seems very ironic, to me; W3C held a workshop a while ago...

 Distributed Indexing/Searching Workshop
 May 28-19, 1996 in Cambridge, Massachusetts

Going into that workshop, my sense was that we needed
a simple format for sites to summarize their contents so
that search engines wouldn't have to crawl the whole thing
to figure out what's there. There was a whole session
on this idea...

"The third breakout/writeup session focused on mechanisms to allow
information servers to notify indexers when content changes."
 -- http://www.w3.org/Search/9605-Indexing-Workshop/ExecSummary.html

What I learned at the workshop was: search engines don't care
what you think is interesting about your site; they have their
own idea about what's interesting, mostly based on links from
other parts of the web. They don't crawl your whole site
just because it's there; they focus on pages that have lots of
incoming links and such.

So now to see google making use of a sitemap format 10 years
later kinda blows my mind.

Note that RSS once stood for Rich Site Summary... interesting...
according to Robin Cover, it still does:

I wonder if google considered something RDF-based like RSS
and decided against it or if they just didn't think about it.

Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Wednesday, 8 June 2005 16:17:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:09 UTC