Re: [ProgressEvents] How to deal with compressed transfer encodings from Bjoern Hoehrmann on 2010-11-24 (public-webapps@w3.org from October to December 2010)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 24 Nov 2010 03:26:16 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <2csoe6tc4tbd7llp67tsnhit2me5r1j8sc@hive.bjoern.hoehrmann.de>

* Jonas Sicking wrote:
>How should ProgressEvents deal with compressed transfer encodings? The
>problem is that the Content-Length header (if I understand things
>correctly) contains the encoded number of bytes, so we don't have
>access to the total number of bytes which will be exposed to the user
>until it's all downloaded. I can see several solutions:

Well, you have some information, you encode that using a media type,
then you possibly encode that using a content encoding, and then you
possibly encode that using a transfer encoding. HTTP uses transfer
encodings for both message framing ("chunked") and transformations,
they are property of the transfer, while content encodings are part
of the content.

I would suggest to ask this question in terms of what .loaded should
be when the download has finished. Should that be how much data has
been recieved after the header, or how much data has been recieved
except for framing information, or what the content developes thinks
the size is, or how many bytes you will ultimately feed to, say, the
HTML parser.

That would be respectively the length of the message body, the length
of the message body after removing the chunked transfer encoding, the
length of the entity body, and the length of the entity body after
removing content encodings. Note that you can apply compression as
both content encoding and as transfer encoding, although the latter
is only supported by good HTTP implementations, like Opera's, but hey,
https://bugzilla.mozilla.org/show_bug.cgi?id=68517 isn't ten years old
yet.

I note that the draft actually defines this already, and I am pretty
sure we discussed this already back in the day.

>B seems spec-wise the simplest, but at least gecko doesn't expose the
>compressed number of bytes downloaded, not sure about other HTTP
>libraries. It also has the downside that .loaded doesn't match
>.responseText.length

Well, to get to the length of the content in terms of UTF-16 code
units you have to remove transfer encodings, content encodings, and
transcode from whatever character encoding the content is in to said
UTF-16 code units, that's yet another layer and not a useful one in
most cases here.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Wednesday, 24 November 2010 02:33:32 UTC