- From: John C Klensin <john+w3c@jck.com>
- Date: Sun, 31 Aug 2014 14:38:31 -0400
- To: Anne van Kesteren <annevk@annevk.nl>
- cc: Larry Masinter <masinter@adobe.com>, Richard Ishida <ishida@w3.org>, "Phillips, Addison" <addison@lab126.com>, www-international@w3.org
--On Sunday, 31 August, 2014 20:04 +0200 Anne van Kesteren <annevk@annevk.nl> wrote: > On Thu, Aug 28, 2014 at 8:08 PM, John C Klensin > <john+w3c@jck.com> wrote: >> Where we seem to be today is that there are a lot of charset >> labels in the IANA Charset Registry. Some of them are >> irrelevant to web browsers (and, depending on how one defines >> it, to the web generally). Others are used in web browsers >> but with exactly the same definitions as appear in the IANA >> Registry. And a few are used --widely so-- in web browsers >> but with different definitions. At the same time, there are >> other applications (and probably some legacy web ones) that >> use the labels in the last category but strictly follow the >> IANA Registry definitions. > > Actually, quite a lot have different definitions when you get > down to the details, because the specifications for the > encodings IANA points to are often not implemented in the > prescribed manner (or lack essential details, such as handling > of errors). To the extent to which that is true, we get back to a variation on Larry's point about making changes where they belong: If we need to update the IANA registry by specifying the essential details that have been omitted, encouraging updating of relevant entries with those details, and then marking everything left as "dangerous" or "suspect" until they are updated, that would seem to me to be entirely reasonable. I think the IETF community would require a persuasive case about those deviant implementations and evidence that they are really the result of definitional gaps or misunderstandings, but that should be easy if it were really the case. Things like using a Windows code page in place of ASCII (the "us-ascii" charset registration) doesn't fit those criteria -- it is a simple decision to deviate, whether for good reasons or not. I think it also leads to a question that I consider legitimate, which is why we would expect conformance to the Encoding spec to be significantly better than conformance to the IANA Charset registry definitions. It is not clear to me that either is less ambiguous than the other. >> The one solace here and the one I hope all involved can agree >> on (or have already) is that, with the exception of writing >> systems whose scripts have not yet been encoded in Unicode, >> everyone ought to be moving away from historical encodings >> and toward UTF-8 as soon as possible. That is the real >> solution to the problem of different definitions and the >> issues they can cause: just move forward to Standard UTF-8 to >> get away from them and consider the present mess as added >> incentive. > Writing systems that cannot be done in Unicode cannot be done > on the web. There's no infrastructure in place for such > systems. (Apart from PUA font hacks.) I agree. I believe that folks who have need for writing systems not supported by Unicode should sort that out with Unicode. Patience may be hard, but future interoperability and compatibility problems are likely to be much worse. I made the observation because there have been a number of comments on this list and elsewhere that PUA font hacks, squatting on unassigned code points, and use of private-use code points, all identified as "UTF-8", are common practice. best, john
Received on Sunday, 31 August 2014 18:38:59 UTC