URI length statistics "in the wild"?

Hi folks

Some topics seem peculiarly ill-suited for Web searches - hence this
mail. I am looking for data on typical lengths of URIs, in particular
as they're used in the public Web. Breakdown by scheme would be nice,
but anything would be a start.

Context for this enquiry is an investigation into the use of
mechanisms like QR Codes and also audio encodings (eg.
http://github.com/diva/digital-voices/ ) as a way of passing URIs
around, eg. to a smartphone from a media centre. I'd like to know
what's out there, what's feasible to encode using these techniques,
and as well as what the official limits are. In
http://tools.ietf.org/html/rfc3986 I don't see much about URI length
except in the reg-name portion.

So - what are the official limits? what are the practical limits (eg.
imposed by common implementations)? Can we say that 99.9% of URIs in
the public Web are shorter than ...X chars?

Ideally barcode and audio encodings wouldn't impose arbitrary limits;
however it would be good to document what's folk can expect to
encounter, if only for sensible testing of error correction, reader
accuracy etc.

Thanks for any pointers,

Dan

Received on Thursday, 8 April 2010 10:13:09 UTC