RE: Encoding API exceptions from Shawn Steele on 2014-11-09 (www-international@w3.org from October to December 2014)

From: Shawn Steele <Shawn.Steele@microsoft.com>
Date: Sun, 9 Nov 2014 19:56:14 +0000
To: Anne van Kesteren <annevk@annevk.nl>, "www-international@w3.org" <www-international@w3.org>
CC: Joshua Bell <jsbell@google.com>, Masatoshi Kimura <VYV03354@nifty.ne.jp>, Boris Zbarsky <bzbarsky@mit.edu>, Domenic Denicola <d@domenic.me>, "Allen Wirfs-Brock" <allenwb@mozilla.com>
Message-ID: <09436a1d66354fc0b36ae952abe33f29@CY1PR0301MB0731.namprd03.prod.outlook.com>

Hmm, I haven't looked at this before, however I'm not too keen on this paragraph of the API spec:

"The other (legacy) encodings have been defined to some extent in the past. However, user agents have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification addresses those gaps so that new user agents do not have to reverse engineer encoding implementations and existing user agents can converge."

I  would prefer much stronger language discouraging the use of legacy encodings.

If "existing user agents can converge" means changing behavior of the mappings, then applications will break.  There have been differences in encoding of characters, which have existed for decades.  Unfortunately that leads to compatibility problems, however in places where they do work today, changing the behavior would cause them to not work tomorrow.  A better solution is to encoded the data in Unicode, which avoids some of the problems WRT the differences in encoding.

I'm also disturbed by the redefinition of the charsets.  I'd much prefer pointers to the existing standards.  Eg: the IANA registry for 1252 points to http://www.iana.org/assignments/charset-reg/windows-1252, which provides a link to non-best-fit and best-fit versions. (granted some of the charsets are less well defined).

I don't mind having those tables in a normalized format to work with a specific API, but it should probably be clear that it's a normalized copy.  Currently the text could be read as being "the" standard definition of the encodings, however a better description might be that it's a normalized collection of encodings, and the files really should point to the standard source definitions.

-Shawn

-----Original Message-----
From: annevankesteren@gmail.com [mailto:annevankesteren@gmail.com] On Behalf Of Anne van Kesteren
Sent: Sunday, November 9, 2014 1:45 AM
To: www-international@w3.org
Cc: Joshua Bell; Masatoshi Kimura; Boris Zbarsky; Domenic Denicola; Allen Wirfs-Brock
Subject: Encoding API exceptions

Both Chromium and Gecko implement the API. There's interest in this API from the wider JavaScript community. Unfortunately for them we use a DOMException which so far is the only dependency on "DOM things".

We also use a TypeError based on IDL enumerations, but reportedly IDL should attempt switching to RangeError there as that is what JavaScript itself uses in such situations. (We cannot use IDL enumerations in the Encoding API because we also trim whitespace and use ASCII case-insensitve matching.)

I was wondering if there would be interest in changing both Chromium and Gecko to throw TypeError where we now throw EncodingError, and throw RangeError where we now throw TypeError. My hope is that this would make the API more attractive as JavaScript's Encoding API (to the extent it isn't that already).

The Encoding API: https://encoding.spec.whatwg.org/#api



--
https://annevankesteren.nl/

Received on Sunday, 9 November 2014 19:56:44 UTC