File API: "Determining encoding" from Glenn Maynard on 2011-11-04 (public-webapps@w3.org from October to December 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Fri, 4 Nov 2011 13:17:58 -0400
To: public-webapps@w3.org
Message-ID: <CABirCh-pSAb3r4F9hOw_xb9fi=7vgztdhcjSLTPH_5EDisnYVA@mail.gmail.com>

Questions and thoughts while reading
http://dev.w3.org/2006/webapi/FileAPI/#enctype:

What does "cannot determine the encoding" mean in step 1?  Does that mean
"if the UA doesn't support the encoding"?  If not, is this spec actually
requiring that every registered encoding be supported?

It's odd that decoding and reencoding a valid Unicode blob doesn't
round-trip, since any BOM is removed during decoding.  Leaving them in
would cause its own problems, though...

It would be clearer if steps 1 and 2 used the same terminology for an
invalid character set.  In step 1, the encoding parameter is declared
"invalid" in prose beforehand, and then its validity is checked in step 1.
In step 2, there's no "invalid" intermediary and it's simply checked
directly.

The flow of the steps is odd.  Step 1 says "decode the Blob and terminate
this set of steps", but that will cause step 6 to not be executed.  Step 2
says "[otherwise] go to the next step", but it never says to do otherwise
(there's no "terminate this set of steps" beforehand).  I think this should
look more like:

> When reading blob objects using the readAsText() read method, the
following encoding determination steps MUST be followed:
>
> 1. Let charset be null.
> 2. If the encoding parameter is specified, and is the name or alias of a
character set used on the Internet [IANACHARSET], let charset be encoding
parameter.
> 3. If charset is null, and the blob's type attribute is present, and its
Charset Parameter [RFC2046] is the name or alias of a character set used on
the Internet, let charset be its Charset Parameter.
> 4. If charset is null, then for each of the rows in the following table,
starting with the first one and going down, if the first bytes of blob
match the bytes given in the first column, then let charset be the encoding
given in the cell in the second column of that row.  [table]
> 5. If charset is null, let charset be UTF-8.
> 6. Return the result of decoding ...

-- 
Glenn Maynard

Received on Friday, 4 November 2011 17:18:30 UTC