RFC3229+feed: XML Deltas have dramatic effect! (was: RE: question: Increasing factor for XML vs Binary)

Stephen D. Williams wrote:
> One thing that is missing from a lot of these analyses is what
> could be saved by being able to do deltas.  In a situation where
> there is any kind of repetition such as protocol messages 
> (in XMPP), records of some kind in a stream or file, or a 
> request/response, the ability to send only what's different
> efficiently may use less CPU and be more efficient than even
> schema-based solutions.

	Deltas can have a dramatic impact on the efficiency of exchanging
XML data. We have solid empirical evidence[1] of this in the domain of the
XML encoded RSS and Atom feeds that are used in blogging.
	Recently, I've been arguing hard that RSS aggregators and servers
should implement the "feed"[2] instance manipulation method for RFC3229[3](
Delta encoding in HTTP). We've already managed to get quite a number of
clients and servers to support RFC3229+feed[4] and the result has been a
massive reduction in the bandwidth we need to serve RSS and Atom files at
PubSub.com. FeedDemon, BlogLines, NewzCrawler, PubSub, Mark Pilgrim's
Universal Feed Parser, etc. all support RFC3229+feed today and more
implementations will come soon -- especially since it is almost trivial to
implement. While the most dramatic savings is in network bandwidth, there is
also a fairly dramatic but system specific gain from a reduced need to
detect "already seen" data in RSS/Atom clients (i.e. a CPU savings and a
reduction in latency).
	At PubSub.com, we've seen our average bytes per request drop to 25%
of the previous number for requests that use the "A-IM: feed" header. Sites
whose feeds update less frequently than ours would see much higher savings.
As documented on my blog (see links below), the result has been large enough
so that we were able to defer planned increases in the bandwidth that we
purchase from our ISP.
	Nonetheless, even if using GZIP, RFC3229+feed, etc., there is still
a need to squeeze the XML even more and to make it even more efficient to
process for exceptionally high throughput sites like PubSub.com. This is why
we still claim that it makes sense for us to convert all XML to ASN.1 PER
for internal processing[5] and why we seek to get binary feeds from high
volume publishers. 
	In summary, there should be no question that delta encoding can
result in very nice, easily achievable, gains in compression. We've proved
it already in the domain of blogging. Hopefully, our example will one day
convince the developers of leading edge HTTP browsers and servers like
Firefox and Apache to implement RFC3229 for HTML, XHTML, etc.

	Plug: If you are implementing an RSS or Atom client or server,
PLEASE implement RFC3229+feed. You won't regret it.

		bob wyman


[1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html
[2] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html
[3] http://www.ietf.org/rfc/rfc3229.txt
[4] http://bobwyman.pubsub.com/main/2004/09/implementations.html
[5] http://bobwyman.pubsub.com/main/2004/02/xml_asn1_and_th.html

Received on Thursday, 18 November 2004 17:39:20 UTC