W3C home > Mailing lists > Public > www-archive@w3.org > January 2011

Websocket frame filtering for the paranoid

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 12 Jan 2011 06:06:23 +0100
To: www-archive@w3.org
Message-ID: <ha8qi65s3mdignql65n9rk5hvfpqkcbrtn@hive.bjoern.hoehrmann.de>

  Not that anyone would seriously consider this, but I was curious how
my earlier suggestion to simply pre-scan masked frames and pick another
key if some bad pattern occurs (and eventually fragment the frame or do
whatever [1]) would fare in practise. So I made a script that takes some
resource, generates a 32 bit random number, repeats it as needed to
cover the resource, xors the two, tries another key if it matches.

I used several data sets for it, one had about 2 million lines of IRC
chat (each line individually), another was all the resources a Webkit
derived browser would request (using my http://cutycapt.sf.net/ tool
and a proxy) visiting the Alexa 100 web sites, all files in my browser
cache smaller than 1 MB, and all the 400 000 files smaller than 1 MB I
have on this system. That seemed like a fair set.

The pattern I used was `/\s[a-z\/\-_.:\s]{4,}[\s:\/]/i`; I did not put
much thought into it, it obviously needs to avoid covering whole ranges
of 1-/2-/3-grams, but this covers obvious things like "\nhost:" or if
you like "\rhost:" and " HTTP/" and "\thttp:" so on. One could add, say,
"GET\n" as alternative so hitting that does not rely on the bytes after
it, that would not change anything.

The results for all sets are nearly identical, for about 98% the first
key avoided the pattern, for another 1,x% the second key did, and there
is some noise where up to 28 keys had to be tried in the worst case [2].

I will spare the weird people who subscribe to www-archive an analysis
of the benefits and objections, as they are rather obvious really.

The way things are going one might expect the group to agree to do some-
thing expensive now, wait a couple of years until the miracle proxies
are found and fixed (they'd pile up strange method names in log files,
consume unreasonable amounts of memory all of the sudden, may corrupt
legitimate Websocket traffic despite allowing the handshake through, and
so on), and then do a cheaper Websocket2 protocol later. Except that of
course having more than one version of things is an "anti-pattern".

[1] Obviously fragmenting messages into multiple frames in rare cases is
    a bad idea, but the protocol draft already allows clients and inter-
    mediaries to do it, so quite likely servers that rely on getting one
    frame per message all the time will break sooner or latter, might be
    better in fact if browsers would fragment frames at random positions
    all the time regardless, so people experience failure quickly.

[2] With a couple of exceptions all coming from the same source. A while
    back I used the CRM114 Discriminator to classify new articles in the
    german Wikipedia (will they be deleted in a couple of days), and the
    database files it generated, all > 700 KB, needed up to 1001 keys.

Well, back now, watching hybians chase their Unicorns with cryptography,
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Wednesday, 12 January 2011 05:07:02 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:43:44 UTC