- From: John C Klensin <john+w3c@jck.com>
- Date: Thu, 28 Aug 2014 14:08:43 -0400
- To: Larry Masinter <masinter@adobe.com>, Richard Ishida <ishida@w3.org>, "Phillips, Addison" <addison@lab126.com>
- cc: www-international@w3.org
Hi Larry, I detest the current situation, especially because of the exact point you make in your initial paragraph: needing to know the exact context and time in which a label is used to know what it actually means serves no one well. The problem, as I understand it, is that some folks in the web browser community found it in their interest to apply their own definitions and extensions to IANA-registered labels. IMO, that was an absolutely terrible idea from a global interoperability standpoint (and several others), but complaining about or lamenting it now will accomplish very little. I'm just guessing, but I'd assume that once page authors started relying on the variant interpretations, they ran to other vendors and said "browser X is doing this, why aren't you supporting it too" and because customer base is often more important than Standards, the others at least mostly went along. That turns "standards violation, bad idea, and bad practice" into "established existing practice". It isn't the only sequence of that sort to have moved through the browser community. I am concerned about the implications of almost all of them, but my concern, or yours, bears the usual relationship to the price of a cup of coffee. Where we seem to be today is that there are a lot of charset labels in the IANA Charset Registry. Some of them are irrelevant to web browsers (and, depending on how one defines it, to the web generally). Others are used in web browsers but with exactly the same definitions as appear in the IANA Registry. And a few are used --widely so-- in web browsers but with different definitions. At the same time, there are other applications (and probably some legacy web ones) that use the labels in the last category but strictly follow the IANA Registry definitions. That is a problem. I think it is a pretty offensive one. But objecting to it will get us nowhere. I predict (as I'm sure you would) that any attempt in the IETF to either depreciate the Registry or incompatibly revise/ update particular definitions would meet with a great deal of resistance, based in part on existing use in applications that are not web browsers. I would expect much the same response if we somehow told the browser community that the IANA definitions were around long before their current generation of work and products, are well-established on the Internet, and that they should mend their ways even if it caused some existing pages to stop working. I don't like the solution of saying what amounts to "if you are a web browser using HTML5, you should, for compatibility with others, use these definitions and not the IANA ones". But, given that neither community is likely to agree to change its ways, it may be the least bad alternative. If it is, there is still a question of how the above should be best stated to avoid sounding like a "pox on your house; no, a pox on yours" style of debate. Might "more historical information and discussion of use by non-web applications" be useful in that regard? I tend to agree with you that it would, but I gather there is some resistance to making it part of the encoding document. The one solace here and the one I hope all involved can agree on (or have already) is that, with the exception of writing systems whose scripts have not yet been encoded in Unicode, everyone ought to be moving away from historical encodings and toward UTF-8 as soon as possible. That is the real solution to the problem of different definitions and the issues they can cause: just move forward to Standard UTF-8 to get away from them and consider the present mess as added incentive. I wish there were a better solution, but I don't have one. If you do, please suggest it. There are, of course, lessons about the risks and disadvantages in this that we should all remember for other areas and future cases. All just my opinion, of course. john --On Thursday, August 28, 2014 15:27 +0000 Larry Masinter <masinter@adobe.com> wrote: > It isn't to anyone's benefit that there are two conflicting > sources of info about character encodings. > > I think if the IANA Character Sets registry is obsolete, the > right thing is to write an Internet Draft saying it's > obsolete, and pointing people to this document instead. > > If you get objections from folks in the IETF, then address > those objections; for example, by including more historical > information and discussion of use by non-web applications. > > So no, I don't find the resolution satisfactory. I'm willing > to help push through such a document in the IETF but would > like some help. > > Larry > -- > http://larry.masinter.net >
Received on Thursday, 28 August 2014 18:09:14 UTC