Re: Compression analysis of perfect atom-based compressor

On 06/04/2013, at 3:55 AM, Martin Nilsson <nilsson@opera.com> wrote:

> On Fri, 05 Apr 2013 01:55:53 +0200, Roberto Peon <grmocg@gmail.com> wrote:
> 
>> 
>>   - We need a better survey of headers from everywhere :)
> 
> I just captured 251'644 requests on a mobile site and counted the number of occurrences of every header, and it is quite the zoo of browser specific, device specific and network specific information added to these requests. Significant amount of request size is spent on wap profile headers like this, which obviously would benefit greatly from compression.
> 
> "X-WAP-Profile-Diff: 1; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept><rdf:Bag><rdf:li>application/vnd.wap.wmlscriptc</rdf:li><rdf:li>text/vnd.wap.wml</rdf:li><rdf:li>application/vnd.wap.xhtml+xml</rdf:li><rdf:li>application/xhtml+xml</rdf:li><rdf:li>text/xml</rdf:li><rdf:li>text/html</rdf:li><rdf:li>text/css</rdf:li><rdf:li>multipart/mixed</rdf:li><rdf:li>*/*</rdf:li></rdf:Bag></prf:CcppAccept></rdf:Description></rdf:RDF>,2; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept-Charset><rdf:Bag><rdf:li>*</rdf:li></rdf:Bag></prf:CcppAccept-Charset></rdf:Description></rdf:RDF>,3; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept-Language><rdf:Seq><rdf:li>en</rdf:li></rdf:Seq></prf:CcppAccept-Language></rdf:Description></rdf:RDF>"

Wow. Hardly know where to start with that one...


> The problem is how to create an exportable version of this kind of information. As you can see in the list, there is a ton of private information. The toplist of headers, cut off at a 100 count:


...

I think it depends on what we want to do with the data. E.g., if it's suitable for feeding into the compression-test suite, you could do so and report the results back to us, or you could come to some agreement with the implementers about sharing the data under terms your lawyers are comfortable with (blech, but still..).

If it's just getting characteristics of the traces to inform our discussions, you could just continue to post your observations, and take requests for other summaries of the data.

Cheers,

--
Mark Nottingham   http://www.mnot.net/

Received on Saturday, 6 April 2013 07:19:23 UTC