Re: [whatwg] Encoding: API from Anne van Kesteren on 2012-10-11 (public-whatwg-archive@w3.org from October 2012)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Thu, 11 Oct 2012 06:09:09 +0200
To: Joshua Bell <jsbell@chromium.org>
Cc: WHATWG <whatwg@whatwg.org>
Message-ID: <CADnb78hnqC5eW6PHmpQJKYjhGJgkC95RvaynWfKFGEXN=KWX6w@mail.gmail.com>

On Wed, Oct 10, 2012 at 7:28 PM, Joshua Bell <jsbell@chromium.org> wrote:
> On Wed, Oct 10, 2012 at 6:42 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
>> I also still think it's kinda yucky that this API has this gigantic
>> hack around what the rest of the platform does with respect to the
>> byte order mark. It seems really weird to not expose the same
>> encode/decode that HTML/XML/CSS/etc. use.
>
> IMHO the API needs to support use cases: (1) code that wants to follow the
> behavior of the web platform with respect to legacy content (i.e. the
> desire to self-host), and (2) code that wants to parse files that are not
> traditionally "web" data, i.e. fragments of binary files, which don't have
> legacy behavior and where BOM taking priority would be surprising to
> developers. For #2, following the behavior of APIs like ICU with respect to
> BOMs is more sensible. I believe #2 is higher priority as long as it does
> not preclude #1, and #1 can be achieved by code that inspects the stream
> before handing it off to the decoder.
>
> Practically speaking, this would mean refactoring the combined spec so that
> the current BOM handling is defined for parsing web content outside of the
> API rather than requiring the API to hack around it.

You would still get the hack because the API requires special
treatment for "utf-16". Given that per Unicode "utf-16le" and
"utf-16be" outlaw the BOM, maybe a good solution would be a flag to
disable BOM handling as seen by the decode algorithm? So the decoder
gets a disableBOM flag that defaults to false? That would only require
a special case for BOM handling on top of what there is today, which
seems a fair bit cleaner.


> I received feedback recently that the API is perhaps too terse right now
> when dealing with streaming content, and a more explicit decode(),
> decodeStream(), resetStream() might be more intelligible. Thoughts?

Either way works for me.


-- 
http://annevankesteren.nl/

Received on Thursday, 11 October 2012 04:09:39 UTC