- From: Bob Wyman <bob@wyman.us>
- Date: Thu, 18 Nov 2004 12:39:03 -0500
- To: "'Mike Champion'" <mc@xegesis.org>, <public-xml-binary@w3.org>
- Cc: <swilliams@hpti.com>
Stephen D. Williams wrote: > One thing that is missing from a lot of these analyses is what > could be saved by being able to do deltas. In a situation where > there is any kind of repetition such as protocol messages > (in XMPP), records of some kind in a stream or file, or a > request/response, the ability to send only what's different > efficiently may use less CPU and be more efficient than even > schema-based solutions. Deltas can have a dramatic impact on the efficiency of exchanging XML data. We have solid empirical evidence[1] of this in the domain of the XML encoded RSS and Atom feeds that are used in blogging. Recently, I've been arguing hard that RSS aggregators and servers should implement the "feed"[2] instance manipulation method for RFC3229[3]( Delta encoding in HTTP). We've already managed to get quite a number of clients and servers to support RFC3229+feed[4] and the result has been a massive reduction in the bandwidth we need to serve RSS and Atom files at PubSub.com. FeedDemon, BlogLines, NewzCrawler, PubSub, Mark Pilgrim's Universal Feed Parser, etc. all support RFC3229+feed today and more implementations will come soon -- especially since it is almost trivial to implement. While the most dramatic savings is in network bandwidth, there is also a fairly dramatic but system specific gain from a reduced need to detect "already seen" data in RSS/Atom clients (i.e. a CPU savings and a reduction in latency). At PubSub.com, we've seen our average bytes per request drop to 25% of the previous number for requests that use the "A-IM: feed" header. Sites whose feeds update less frequently than ours would see much higher savings. As documented on my blog (see links below), the result has been large enough so that we were able to defer planned increases in the bandwidth that we purchase from our ISP. Nonetheless, even if using GZIP, RFC3229+feed, etc., there is still a need to squeeze the XML even more and to make it even more efficient to process for exceptionally high throughput sites like PubSub.com. This is why we still claim that it makes sense for us to convert all XML to ASN.1 PER for internal processing[5] and why we seek to get binary feeds from high volume publishers. In summary, there should be no question that delta encoding can result in very nice, easily achievable, gains in compression. We've proved it already in the domain of blogging. Hopefully, our example will one day convince the developers of leading edge HTTP browsers and servers like Firefox and Apache to implement RFC3229 for HTML, XHTML, etc. Plug: If you are implementing an RSS or Atom client or server, PLEASE implement RFC3229+feed. You won't regret it. bob wyman [1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html [2] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html [3] http://www.ietf.org/rfc/rfc3229.txt [4] http://bobwyman.pubsub.com/main/2004/09/implementations.html [5] http://bobwyman.pubsub.com/main/2004/02/xml_asn1_and_th.html
Received on Thursday, 18 November 2004 17:39:20 UTC