RE: [Encoding] false statement [I18N-ACTION-328][I18N-ISSUE-374]

Hello Larry,

The Internationalization Working Group has recorded this comment as I18N-ISSUE-374 against the Encoding spec during the recent Last Call.

In yesterday’s WG teleconference (which included Anne as a guest), we discussed this issue at length [1]. The WG has actioned me with responding.

The problem here is that, on the one hand, the Encoding specification is intended to provide a complete and interoperable description of character encoding conversion for use specifically in Web browsers and other Web-related formats, protocols, and their attendant implementations. This specification is incompatible with certain aspects of the IANA charset registry and the encoding conversion rules associated with it.

It does not, however, as you point out, “obsolete” or replace the IANA registry for all those formats, protocols, and implementations that have otherwise depended (in the past, currently, or in the future) on the IANA registry. Attempting to obsolete said registry would require greater coordination and handle the legacy considerations you mention and this goes beyond the scope of what the current iteration of this document aims to provide.

The simplest response would be to remove the offending paragraph. However, the WG feels that ignoring IANA charsets is not realistic and that it is better to directly address the problem. Anne is amenable to textual changes to address this issue.

In the teleconference, we discussed the following proposed text to replace the current text:

--
The IANA Character Sets registry also tracks character encodings and their specifications (if any), as well as assigning labels to registered character encodings. This specification is the recommended normative reference for specifications that define Web protocols and document formats and their implementations.
--

Note that the above is taken from a longer proposal [2], which was part of a thread examining different textual options. We did not discuss the additional text yesterday in the call, but that proposal went on to say:

--
Implementers should be aware that many protocols and formats (such as, for example, email) are not consistent with all of the requirements of this specification. Additional care in selecting encoding labels and performing encoding conversions needs to be applied when exchanging data between Web contexts conformant with this specification and these external protocols and formats.
--

Would changing to the proposed wording satisfy your comment? If not, can you propose text that would clarify the relationship between the two and how to choose between them? I don’t think it is necessary to use pejorative language (“obsolete”, “supersede”, etc.) about either specification to do this.

Thanks,

Addison (for I18N)

[1] http://www.w3.org/2014/08/07-i18n-minutes.html

[2] https://lists.w3.org/Archives/Member/member-i18n-core/2014Jul/0027.html


From: Larry Masinter [mailto:masinter@adobe.com]
Sent: Tuesday, July 01, 2014 11:00 AM
To: Joshua Bell; John Cowan
Cc: Anne van Kesteren; Mark Davis ☕️; Asmus Freytag; www-international@w3.org<mailto:www-international@w3.org>
Subject: RE: [Encoding] false statement

If you scope the override to the web, you need to address workflows of interoperability between web and non-web (web-based email and instant messaging clients, for example), where the non-web application really uses the IANA-registered values. I don’t think that’s a world we want to aim for. One Web, One Internet.



I think it’s better to supplant the IANA charset registry by providing something better – better for all.

I don’t think it’s really a feature to turn off the ability to register new charsets completely, even if it is rare and of limited applicability.  (separate message).

The information in this specification should be merged into the IANA charset registry and presented in a form that is at least as useful as this spec, and also at least as useful as the current registry. (A low bar on both counts, we could ask for more.)

Once that integration has been completed (i.e., the IANA charset registry notes all info as conveyed here), then this specification itself will be redundant.

There is some work to be done to modify the IANA charset registry, including getting IETF consensus to make these enhancements, and perhaps even needing changes to IANA’s charter. Substantial work, but something that could be contracted for. Perhaps by W3C, perhaps as part of the transfer of IANA to ICANN.

It may be in the process we will want to revisit the W3C/WHATWG prioritization that makes browser-to-legacy-nonconforming-content interoperability higher priority than browser-to-nonweb-application.



Larry
--
http://larry.masinter.net


From: Joshua Bell [mailto:jsbell@google.com]
Sent: Tuesday, July 01, 2014 10:06 AM
To: John Cowan
Cc: Anne van Kesteren; Mark Davis ☕️; Asmus Freytag; Larry Masinter; www-international@w3.org<mailto:www-international@w3.org>
Subject: Re: [Encoding] false statement

On Mon, Jun 30, 2014 at 12:02 PM, John Cowan <cowan@mercury.ccil.org<mailto:cowan@mercury.ccil.org>> wrote:
Anne van Kesteren scripsit:

> # Historically many encodings had their names and labels (and sometimes
> # references to specifications) defined in the IANA Character Sets
> # registry.  This specification supplants that registry.

You are unsurprisingly[*] continuing to miss the point.  The issue is not
whether you say "supplants" or "makes obsolete", which are effectively
synonymous, but that you clarify the scope of the claim.  Wider concerns
exist than the behavior of a few Web browsers, and it is inappropriate,
to say the least, to use absolute language more fitted to the laws of
physics when describing what they do or should do.

Along the lines of the clarification Henri makes in https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646#c36 it seems that the spec should be explicit that it describes the use of text encodings for the Web platform. The HTML spec itself uses the phrase "This specification defines a big part of the Web platform..." in the introduction.

How about:

>>>
While encodings have been defined by many diverse standards, implementations of the Web platform (i.e. Web browsers) have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification attempts to fill those gaps so that new Web platform implementations do not have to reverse engineer encoding implementations of the market leaders and existing implementations can converge.

In particular, this specification defines the encodings, their algorithms to go from bytes to code points and back, and their canonical names and identifying labels for the Web platform. This specification also defines an API to expose part of the encoding algorithms to JavaScript for the Web platform.

Historically encodings and their specifications (if any) were kept track of by the IANA Character Sets registry. This specification supplants the use of that registry for the Web platform.
<<<

That repeats "Web platform" what seems an excessive number of times, but I believe it's important; I have a (poorly maintained) polyfill for the JS API and get frequent requests from non-Web platform users (i.e. the Node.js community) to make changes that are not aligned with the spec and have had to clarify the purpose and scope of the polyfill.

[*] I say it isn't surprising based on a _mot_ of Upton Sinclair's:
"It is difficult to get a man to understand something, when his salary
[or his status] depends upon his not understanding it!"

What, the Internet isn't synonymous with the World Wide Web? What madness is this? :)

--
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org<mailto:cowan@ccil.org>
Let's face it: software is crap. Feature-laden and bloated, written under
tremendous time-pressure, often by incapable coders, using dangerous
languages and inadequate tools, trying to connect to heaps of broken or
obsolete protocols, implemented equally insufficiently, running on
unpredictable hardware -- we are all more than used to brokenness.
                   --Felix Winkelmann

Received on Friday, 8 August 2014 19:03:57 UTC