- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 7 May 2003 21:36:20 -0700
- To: ietf-http-wg@w3.org
- Cc: wrec@cs.utk.edu
I'm slowly building a collection of traces for access to a relatively new kind of Web content - RSS feeds. If you're not familiar with it, RSS is a format that represents a list of items, each with its own title, description, link, and other metadata. Clients (often called "aggregators") periodically poll to build a view of the channel over time, adding new items in the representation to a local store. In this manner, one can keep abreast of news headlines and other chronologically-ordered lists. This format is becoming more popular, both because of increasing support (sites like MSDN, the New York Times and CNN all have RSS feeds) and the arrival of "weblogs" (which also uses RSS). There are a number of interesting questions about RSS that come to mind, including; - what is the polling interval? - how common is validation? - what is the rate of change for the RSS? - what is the size of the feed? - what times of day does polling happen? - how self-similar is RSS traffic (is is "lumpy" around the top of the hour, for example?) I suspect that RSS, because it is polled, is not at all typical Web traffic, and therefore places unusual requirements on Web servers and intermediaries. I also suspect that it may eventually require us to rethink distribution; invalidation and other approaches may become much more desirable, as opposed to polling. Rather than keep all of the fun for myself (I have a day job), I've placed the traces on the Web for the greater enjoyment of the caching and traffic characterization community ;) They are at: http://www.mnot.net/rss/traces/ (this will redirect to another site; please bookmark the URI above, in case it changes). So far, I have one trace; it has been anonymized (combined log format with the client IP, ident, userinfo, URI, referer and user-agent fields hashed or half-hashed). If there is another format that's more suitable, please tell me. This trace contains about 500,000 entries and represents a week's worth of access to a RSS "scraping" service; i.e., it's a Web site that processes other Web sites to produce a number of feeds. As such, it contains accesses to multiple feeds. Please tell me if this is interesting/useful, and send along any results you come up with. I'm working on getting more traces; stay tuned (are any of the repositories - e.g., W3C WCA, Internet Traffic Archive - still active? Neither has seen anything new in quite some time...). Regards,
Received on Thursday, 8 May 2003 00:36:31 UTC