Re: [File API]: "Determining encoding" from Arun Ranganathan on 2012-01-11 (public-webapps@w3.org from January to March 2012)

From: Arun Ranganathan <aranganathan@mozilla.com>
Date: Wed, 11 Jan 2012 13:58:04 -0800 (PST)
To: glenn@zewt.org
Cc: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>
Message-ID: <5b91f2eb-6791-4454-a36d-8de6922a26f5@zimbra1.shared.sjc1.mozilla.com>

Glenn,

Sorry about letting this one get by unanswered -- I was OOTO at the time you sent it.


>> Questions and thoughts while reading
>> http://dev.w3.org/2006/webapi/FileAPI/#enctype:

>> is this spec actually
>> requiring that every registered encoding be supported?

What's required is that UAs support as much of the encodings in [IANACHARSET] as possible -- I think that's fair.  I've rewritten the algorithm to allow for what's not supported to be treated as UTF-8.  

Upon reflection, it might be prudent to decide a minimum subset of supported encodings, but I'm also comfortable leaving this to implementations and not saying anything about it.  What do you think?

>> It would be clearer if steps 1 and 2 used the same terminology for an
>> invalid character set.  

<snip />

I really liked your version -- much clearer than the original text -- and so I've rewritten the editor's draft to reflect the change.  Many thanks :)

http://dev.w3.org/2006/webapi/FileAPI/#encoding-determination

-- A*

> When reading blob objects using the readAsText() read method, the
following encoding determination steps MUST be followed:
>
> 1. Let charset be null.
> 2. If the encoding parameter is specified, and is the name or alias of a
character set used on the Internet [IANACHARSET], let charset be encoding
parameter.
> 3. If charset is null, and the blob's type attribute is present, and its
Charset Parameter [RFC2046] is the name or alias of a character set used on
the Internet, let charset be its Charset Parameter.
> 4. If charset is null, then for each of the rows in the following table,
starting with the first one and going down, if the first bytes of blob
match the bytes given in the first column, then let charset be the encoding
given in the cell in the second column of that row.  [table]
> 5. If charset is null, let charset be UTF-8.
> 6. Return the result of decoding ...

[IANACHARSET] http://www.iana.org/assignments/character-sets

-- 
Glenn Maynard

Received on Wednesday, 11 January 2012 21:58:42 UTC