- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 12 Jan 2011 06:06:23 +0100
- To: www-archive@w3.org
Hi, Not that anyone would seriously consider this, but I was curious how my earlier suggestion to simply pre-scan masked frames and pick another key if some bad pattern occurs (and eventually fragment the frame or do whatever [1]) would fare in practise. So I made a script that takes some resource, generates a 32 bit random number, repeats it as needed to cover the resource, xors the two, tries another key if it matches. I used several data sets for it, one had about 2 million lines of IRC chat (each line individually), another was all the resources a Webkit derived browser would request (using my http://cutycapt.sf.net/ tool and a proxy) visiting the Alexa 100 web sites, all files in my browser cache smaller than 1 MB, and all the 400 000 files smaller than 1 MB I have on this system. That seemed like a fair set. The pattern I used was `/\s[a-z\/\-_.:\s]{4,}[\s:\/]/i`; I did not put much thought into it, it obviously needs to avoid covering whole ranges of 1-/2-/3-grams, but this covers obvious things like "\nhost:" or if you like "\rhost:" and " HTTP/" and "\thttp:" so on. One could add, say, "GET\n" as alternative so hitting that does not rely on the bytes after it, that would not change anything. The results for all sets are nearly identical, for about 98% the first key avoided the pattern, for another 1,x% the second key did, and there is some noise where up to 28 keys had to be tried in the worst case [2]. I will spare the weird people who subscribe to www-archive an analysis of the benefits and objections, as they are rather obvious really. The way things are going one might expect the group to agree to do some- thing expensive now, wait a couple of years until the miracle proxies are found and fixed (they'd pile up strange method names in log files, consume unreasonable amounts of memory all of the sudden, may corrupt legitimate Websocket traffic despite allowing the handshake through, and so on), and then do a cheaper Websocket2 protocol later. Except that of course having more than one version of things is an "anti-pattern". [1] Obviously fragmenting messages into multiple frames in rare cases is a bad idea, but the protocol draft already allows clients and inter- mediaries to do it, so quite likely servers that rely on getting one frame per message all the time will break sooner or latter, might be better in fact if browsers would fragment frames at random positions all the time regardless, so people experience failure quickly. [2] With a couple of exceptions all coming from the same source. A while back I used the CRM114 Discriminator to classify new articles in the german Wikipedia (will they be deleted in a couple of days), and the database files it generated, all > 700 KB, needed up to 1001 keys. Well, back now, watching hybians chase their Unicorns with cryptography, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 12 January 2011 05:07:02 UTC