W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2012

Re: [File API]: "Determining encoding"

From: Glenn Maynard <glenn@zewt.org>
Date: Wed, 11 Jan 2012 20:43:31 -0500
Message-ID: <CABirCh9c8R3XuOh327ymdU1YnNL76OY0tKVX5GGZJoN1aEJ5BA@mail.gmail.com>
To: Arun Ranganathan <aranganathan@mozilla.com>
Cc: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>
You may want to coordinate with Anne regarding charset support requirements
and his in-progress encodings spec.
On Jan 11, 2012 1:58 PM, "Arun Ranganathan" <aranganathan@mozilla.com>
wrote:

> Glenn,
>
> Sorry about letting this one get by unanswered -- I was OOTO at the time
> you sent it.
>
>
> >> Questions and thoughts while reading
> >> http://dev.w3.org/2006/webapi/FileAPI/#enctype:
>
> >> is this spec actually
> >> requiring that every registered encoding be supported?
>
> What's required is that UAs support as much of the encodings in
> [IANACHARSET] as possible -- I think that's fair.  I've rewritten the
> algorithm to allow for what's not supported to be treated as UTF-8.
>
> Upon reflection, it might be prudent to decide a minimum subset of
> supported encodings, but I'm also comfortable leaving this to
> implementations and not saying anything about it.  What do you think?
>
> >> It would be clearer if steps 1 and 2 used the same terminology for an
> >> invalid character set.
>
> <snip />
>
> I really liked your version -- much clearer than the original text -- and
> so I've rewritten the editor's draft to reflect the change.  Many thanks :)
>
> http://dev.w3.org/2006/webapi/FileAPI/#encoding-determination
>
> -- A*
>
> > When reading blob objects using the readAsText() read method, the
> following encoding determination steps MUST be followed:
> >
> > 1. Let charset be null.
> > 2. If the encoding parameter is specified, and is the name or alias of a
> character set used on the Internet [IANACHARSET], let charset be encoding
> parameter.
> > 3. If charset is null, and the blob's type attribute is present, and its
> Charset Parameter [RFC2046] is the name or alias of a character set used on
> the Internet, let charset be its Charset Parameter.
> > 4. If charset is null, then for each of the rows in the following table,
> starting with the first one and going down, if the first bytes of blob
> match the bytes given in the first column, then let charset be the encoding
> given in the cell in the second column of that row.  [table]
> > 5. If charset is null, let charset be UTF-8.
> > 6. Return the result of decoding ...
>
> [IANACHARSET] http://www.iana.org/assignments/character-sets
>
> --
> Glenn Maynard
>
Received on Thursday, 12 January 2012 02:10:28 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:49 GMT