- From: John C Klensin <john+w3c@jck.com>
- Date: Thu, 28 Aug 2014 18:59:04 -0400
- To: Larry Masinter <masinter@adobe.com>, Richard Ishida <ishida@w3.org>, "Phillips, Addison" <addison@lab126.com>
- cc: www-international@w3.org
--On Thursday, August 28, 2014 20:20 +0000 Larry Masinter <masinter@adobe.com> wrote: >> I predict (as I'm sure >> you would) that any attempt in the IETF to either depreciate >> the Registry or incompatibly revise/ update particular >> definitions would meet with a great deal of resistance, based >> in part on existing use in applications that are not web >> browsers. > > I'm sure there would be some resistance, but there's resistance > to everything. Which applications don't want to be compatible > with the web? I think it's worth a try , to do the right > thing. Assuming your compatibility question is not just rhetorical, any well-established application whose use, in practice, depends on the IANA Charset Registry definitions, including registry entries that the Encoding spec essentially bans entirely. Using the application used to process and transmit these messages as an example, it is also noteworthy that the number of web browsers, or even web servers, in use is fairly small. By contrast, the number of SMTP clients and servers, including independently-developed submission clients built into embedded devices, is huge and the number of mail user agents even larger. So an instruction from the IETF (or W3C or some other entity) to those email systems to abandon the IANA Registry's definitions in favor of some other norm would, pragmatically, be likely to make things worse rather than better, creating a number of variations on the theme I think Andrew Cunningham is concerned about, i.e., even more systems that use a given charset label but that interpret it in different ways. >> I would >> expect much the same response if we somehow told the browser >> community that the IANA definitions were around long before >> their current generation of work and products, are >> well-established on the Internet, and that they should mend >> their ways even if it caused some existing pages to stop >> working. > > This document is part of the mending. It is interesting, and perhaps illustrative of the issues here, that you read my remark that way. From my perspective as someone who was involved with email definitions long before there was a web, who first got tangled up with the problems of transmitting CCS and encoding information out of band back when Kermit was the state of the art in textual data interchange with multiple character sets, when I said "mend their ways", I meant "stop this nonsense and, if you use a label that appears in the IANA Charset Registry, use is to describe _exactly_ what is defined there and, if you don't like that, define your own label and put it in the Registry. To a considerable extent, that means I see the Encoding document as institutionalizing the problem. I dislike that, but all of the alternatives seem to be worse at the moment. >> I don't like the solution of saying what amounts to "if you >> are a web browser using HTML5, you should, for compatibility >> with others, use these definitions and not the IANA ones". >> But, given that neither community is likely to agree to >> change its ways, it may be the least bad alternative. > > I'm not sure the communities are separate. There's one > Internet and text flows readily between web and non-web. > Sure there are people who subscribe to one list or > another. Sure. But that and scale measured in numbers of deployed independent implementations and the difficulties associated with changing them, would seem to argue strongly for at least mostly changing the web browsers to conform to what is in the IANA registry (possibly there are Registry entries that might need tuning too --the IETF Charset procedures don't allow that at present but, at you point out, they could, at least in principle, be changed) rather than trying to retune the Internet to match what a handful of browser vendors are doing. >> .... Might "more historical information and discussion of >> use by non-web applications" be useful in that regard? I tend >> to agree with you that it would, but I gather there is some >> resistance to making it part of the encoding document. > > Sometimes you have to do more work than you want to, > In order to make things right. But I'm not sure it's really > all that much. Maybe all that's needed is a pointer from > the IANA registry to this document and vice versa, telling > readers to be aware of the other, and encouraging new > applications to use utf-8. As I said to Andrew Cummingham, that, when you say "use utf-8" you are almost certainly talking about using UTF-8 encoding with Standard Unicode code point assignments (or following what the IANA Registry presumably says, as you prefer), Given that and speaking personally rather than predicting IETF reactions, I would see no problem at all annotating the IANA Registry entries for a few Charsets with comments that an alternate interpretation has been seen in the wild, that those using that Charset should consequently use caution, and, ideally, describing what the deviations are. That wouldn't do much for the pseudo-Unicode posing as UTF-8 situation that Andrew describes, but it would probably work reasonably well for, e.g., the "sometimes 'us-ascii' is really Windows 1252" problem. If you and others thought it worthwhile to see if we can figure out an appropriate IETF mechanism to create that annotation, I'd be happy to collaborate. Notes about reality, however unfortunate that reality is, should always be welcome. It would, however, probably not be worth the effort if all the current Encoding spec has to say on the subject is equivalent to "don't pay any attention to whatever the IANA Charset Registry says" (or worse). john
Received on Thursday, 28 August 2014 22:59:32 UTC