W3C home > Mailing lists > Public > uri@w3.org > April 2010

Re: URI length statistics "in the wild"?

From: Erik van der Poel <erikv@google.com>
Date: Thu, 8 Apr 2010 07:41:12 -0700
Message-ID: <x2zc07a32651004080741gb0ed176bv1c8b31d1f85fec17@mail.gmail.com>
To: Dan Brickley <danbri@danbri.org>
Cc: uri@w3.org
The typical length of a URL as found in HTML on the Web is around 64
bytes. I don't know what the average and median are because I bucketed
the stats in powers of 2 (i.e. ..., up to 32, up to 64, 128, etc). The
peak for these buckets was 64.

There is a sharp drop at 2048. This makes sense because MSIE's limit
in HTTP requests is 2048. Firefox and Chrome do not appear to have
limits. (I gave up trying when I reached 32k.)

MSIE's limit in URLs in HTML is 4096 characters. This is not the same
as 4096 bytes. MSIE uses UTF-16 internally. I used IDNA to find this
limit.

Good luck with your barcode/audio efforts,

Erik

On Thu, Apr 8, 2010 at 3:06 AM, Dan Brickley <danbri@danbri.org> wrote:
> Hi folks
>
> Some topics seem peculiarly ill-suited for Web searches - hence this
> mail. I am looking for data on typical lengths of URIs, in particular
> as they're used in the public Web. Breakdown by scheme would be nice,
> but anything would be a start.
>
> Context for this enquiry is an investigation into the use of
> mechanisms like QR Codes and also audio encodings (eg.
> http://github.com/diva/digital-voices/ ) as a way of passing URIs
> around, eg. to a smartphone from a media centre. I'd like to know
> what's out there, what's feasible to encode using these techniques,
> and as well as what the official limits are. In
> http://tools.ietf.org/html/rfc3986 I don't see much about URI length
> except in the reg-name portion.
>
> So - what are the official limits? what are the practical limits (eg.
> imposed by common implementations)? Can we say that 99.9% of URIs in
> the public Web are shorter than ...X chars?
>
> Ideally barcode and audio encodings wouldn't impose arbitrary limits;
> however it would be good to document what's folk can expect to
> encounter, if only for sensible testing of error correction, reader
> accuracy etc.
>
> Thanks for any pointers,
>
> Dan
>
>
Received on Thursday, 8 April 2010 14:41:43 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:14 UTC