W3C home > Mailing lists > Public > public-xml-binary@w3.org > November 2004

Re: RFC3229+feed: XML Deltas have dramatic effect!

From: Stephen D. Williams <sdw@lig.net>
Date: Tue, 23 Nov 2004 13:44:38 -0500
Message-ID: <41A38516.20908@lig.net>
To: bob@wyman.us
Cc: 'Mike Champion' <mc@xegesis.org>, public-xml-binary@w3.org, swilliams@hpti.com
This is great information and experience.  RFC3229, which seems to rely 
mainly on the free-for-RFC3229-only VCDIFF algorithm, is quite 
interesting and instructive in how to handle the convoluted issues of 
HTTP 1.0/1.1 and HTTP caching.

The idea of a delta as a difference from a parent is fairly clear and 
representable by a standard mechanism, but as the literature around 
VCDIFF and other algorithms points out, the methods used to arrive at 
that delta will vary widely and will evolve while still creating 
compatible output.

Deltas as I proposed them in relation to XML Binary Characterization are 
representable in a clean way by an appropriate format.  One key idea I 
am proposing is that although the delta could be produced by an 
algorithmic differencing step like VCDIFF, it is usually better created 
by having a 'copy on write' layer above a directly modifyable parent.  
This means that creating a delta requires no analysis or extra memory, 
only the bookeeping needed for the 'virtual' data layer under the 
semantic level of the format.


Bob Wyman wrote:

>Stephen D. Williams wrote:
>>One thing that is missing from a lot of these analyses is what
>>could be saved by being able to do deltas.  In a situation where
>>there is any kind of repetition such as protocol messages 
>>(in XMPP), records of some kind in a stream or file, or a 
>>request/response, the ability to send only what's different
>>efficiently may use less CPU and be more efficient than even
>>schema-based solutions.
>	Deltas can have a dramatic impact on the efficiency of exchanging
>XML data. We have solid empirical evidence[1] of this in the domain of the
>XML encoded RSS and Atom feeds that are used in blogging.
>	Recently, I've been arguing hard that RSS aggregators and servers
>should implement the "feed"[2] instance manipulation method for RFC3229[3](
>Delta encoding in HTTP). We've already managed to get quite a number of
>clients and servers to support RFC3229+feed[4] and the result has been a
>massive reduction in the bandwidth we need to serve RSS and Atom files at
>PubSub.com. FeedDemon, BlogLines, NewzCrawler, PubSub, Mark Pilgrim's
>Universal Feed Parser, etc. all support RFC3229+feed today and more
>implementations will come soon -- especially since it is almost trivial to
>implement. While the most dramatic savings is in network bandwidth, there is
>also a fairly dramatic but system specific gain from a reduced need to
>detect "already seen" data in RSS/Atom clients (i.e. a CPU savings and a
>reduction in latency).
>	At PubSub.com, we've seen our average bytes per request drop to 25%
>of the previous number for requests that use the "A-IM: feed" header. Sites
>whose feeds update less frequently than ours would see much higher savings.
>As documented on my blog (see links below), the result has been large enough
>so that we were able to defer planned increases in the bandwidth that we
>purchase from our ISP.
>	Nonetheless, even if using GZIP, RFC3229+feed, etc., there is still
>a need to squeeze the XML even more and to make it even more efficient to
>process for exceptionally high throughput sites like PubSub.com. This is why
>we still claim that it makes sense for us to convert all XML to ASN.1 PER
>for internal processing[5] and why we seek to get binary feeds from high
>volume publishers. 
>	In summary, there should be no question that delta encoding can
>result in very nice, easily achievable, gains in compression. We've proved
>it already in the domain of blogging. Hopefully, our example will one day
>convince the developers of leading edge HTTP browsers and servers like
>Firefox and Apache to implement RFC3229 for HTML, XHTML, etc.
>	Plug: If you are implementing an RSS or Atom client or server,
>PLEASE implement RFC3229+feed. You won't regret it.
>		bob wyman
>[1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html
>[2] http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html
>[3] http://www.ietf.org/rfc/rfc3229.txt
>[4] http://bobwyman.pubsub.com/main/2004/09/implementations.html
>[5] http://bobwyman.pubsub.com/main/2004/02/xml_asn1_and_th.html

swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Tuesday, 23 November 2004 18:43:14 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:42:01 UTC