Re: Mobile headers

Hi Martin,

Thanks for that.

The file isn't valid JSON; while some issues are easy to fix (e.g., wrapping the whole thing in an array), there appear to be a number of other issues; e.g., at line 918:

{
  "accept" : "application/vnd.wap.xhtml+xml, application/xhtml+xml, text/html, $
  "accept-encoding" : {
    "deflate, gzip, x-gzip, identity, *;q=0"
    "deflate, gzip"
,
  "connection" : "Keep-Alive",
  "content-length" : "19",
  "content-type" : "text/plain",
  "host" : "m.opera.com",

Notice the missing delimiter after the first accept-encoding value, and what appears to be a missing end bracket.

This appears to happen a lot (I guess I could fix it up with a regex, but ew).

I *think* what you mean to do is:

{
  "accept" : "application/vnd.wap.xhtml+xml, application/xhtml+xml, text/html, $
  "accept-encoding" : [
    "deflate, gzip, x-gzip, identity, *;q=0",
    "deflate, gzip"
  ],
  "connection" : "Keep-Alive",
  "content-length" : "19",
  "content-type" : "text/plain",
  "host" : "m.opera.com",

to denote when there are multiple instances of a single header field?

Cheers,


On 11/10/2013, at 9:42 AM, Martin Nilsson <nilsson@opera.com> wrote:

> 
> So I've finally scrubbed the mobile HTTP header captures that I mentioned earlier. The capture was made on one of our Opera Mini download servers in March. The request headers were then parsed and the most broken requests removed. After that all requests coming from internal systems, monitoring and Opera Mini servers were removed. I'm aware that there are some cases where the headers are not parsed correctly when the linebreaks keep changing between different header lines. Finally the headers are scrubbed where everything resembling IP numbers and longer sequences of digits are replaced with random, similar looking data. Also the header values of y-msisdn, referer, authorization and proxy-authorization are replaced with random data. All request payload data is stripped as well. The resulting data is written into a a json structure with one object per request where lower case header names maps to a string, or an array of strings in the case of multiple headers. Order is preserved in the array, but not amongst the different headers (they are serialized sorted). The file contains 203586 requests and is 154MB uncompressed, 20MB compressed.
> 
> http://people.opera.com/nilsson/headers.json.gz
> 
> Please report report any issues or concerns.
> 
> /Martin Nilsson
> 
> -- 
> Using Opera's revolutionary email client: http://www.opera.com/mail/
> 

--
Mark Nottingham   http://www.mnot.net/

Received on Wednesday, 16 October 2013 19:55:24 UTC