- From: John C Klensin <john+w3c@jck.com>
- Date: Sun, 31 Aug 2014 19:31:02 -0400
- To: Andrew Cunningham <lang.support@gmail.com>
- cc: Anne van Kesteren <annevk@annevk.nl>, Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, Larry Masinter <masinter@adobe.com>
Andrew (and, by the way, John Cowan), I certainly did not intend to be either brutal or elitist. I'm trying to separate what seem to me to be multiple problems in the hope that it will help us move forward. Over the history of the Internet (and a few other technologies with which I've had to work), institutionalizing incompatibility has rarely turned out to be a good idea. Sometimes it happens and we have to work around it, sometimes those workarounds are successful, but even that rarely changes the "bad idea" part. If I had a script that wasn't supported by Unicode, I'd be unlikely to write a proposal to get it coded and then sit around waiting for years waiting for them to do it. However, I would write the proposal and, when I created an interim system, I'd try to make sure there was a migration plan and, ideally, that my interim system didn't conflict with anyone else's. I think we have some historically-established ways of doing that which we know how to handle. I'd hate to see us go back to 2022 and expand that registry but I can also imagine its being an interesting (and non-conflicting) solution while waiting for Unicode and, if ISO/IEC JTC1/SC2 isn't willing to maintain and update that registry, I can imagine several entities who could take over. If the Unicode Consortium understands and is convinced that this has become a serious problem, perhaps they could start conditionally reserving some blocks for as-yet-uncoded scripts so at least there could be unambiguous migration paths, perhaps via a new subspecial of compatibility mappings or providing surrogate-like escapes to other code points that would parallel the 2022 system. There may also be better ideas, but I wish you (and others) would propose them rather than --it seems to me-- merely complaining in louder and louder voices (or, in this case, name-calling). Any of those approaches (at least the ones I can think of) would be very ugly, but far preferable to disguising a lot of one-off font tricks or pseudo-Unicode, with potentially overlapping code points, as Standard UTF-8 and hoping that the end systems can sort out what is going on without any in-stream clues. That just leads to a very fragmented environment in which people cannot communicate... or worse. If the official Unicode Consortium position were really "people should just wait to use their languages until we get around to assigning code points and we reserve the right to take as many years as we like" and the official WHATWG (much less W3C) position were really "if your language and script don't have officially assigned Unicode code points, you don't get to be on the web" then it is probably time for the broader community to do something about those groups. Fortunately I haven't heard anyone who can reasonably claim to speak for any of those bodies say anything like that. If you have, references would be welcome. More or less the same situation applies to the Encoding spec. It still seems to me that it should be targeting UTF-8 and Standard Unicode with other things viewed as transitional. That doesn't solve your "pseudo-Unicode" problem, but, AFAICT, it doesn't make it any worse either. As I have tried to say before, I (at least) would be interested in what you do propose but, so far, you just seem to be complaining about things that won't work in the contexts you are concerned about. Certainly the web browsers and other software who are now supporting font or [other?] pseudo-Unicode tricks aren't going to stop doing so because Anne, WHATWG, or W3C say those tricks are bad -- everyone knows they are bad already, even (or especially) those who think they are necessary and those who think they are necessary are (correctly, IMO) unlikely to change their minds until after someone offers real alternatives. Finally (at least for today) there is a choice in principle between saying "the browser vendors and page authors who are using IANA Registry Charset labels but doing something else are causing interoperability problems with the rest of the Internet and the rest of the world and should be designing ways to get out of that hole" and saying "many of the browser vendors are doing this and, while it differs from what the IANA Registry is usually believed to specify, it is the standard because they are doing it and therefore everyone else should get in line". The first may be impractical (and probably is unless higher powers intervene). The second (or variants on it) would be a whole lot more attractive if the community could feel some assurance that we wouldn't have to look forward to another round of the same thing in the future, e.g., an "Encoding 2018" spec that said "don't pay any attention to the labels and definitions established in Encoding 2014 because the browser vendors went off in another direction". One possible implication of your comments is that the risk of that situation is pretty high; if it is, then we really ought to be discussing a better solution to it than either making proclamations that will be ignored or engaging in fervent prayer that the light coming toward us really isn't a train after all. best, john --On Monday, 01 September, 2014 08:22 +1000 Andrew Cunningham <lang.support@gmail.com> wrote: > Anne and John > > Your comments read as brutal and elitist. > > Do you have any idea of how long it takes to prepare and > shepard through a Unicode proposal? How much work and > resources it can take? > > The communities that need Unicode support don't necessarily > have the resources or expertise to prepare the proposals. > > Recently a proposal went to UTC to disunify some charactrrs in > the Myanmar block. The proposal was rightly rejected. > > I had a chat to one of the authors of that proposal. What was > interesting was the reason for preparing the proposal in the > first place. > > Essentially the problem was web browsers were precieved to > have problems with displaying content in the languages in > question. > > Essentially they were trying to get changes in unicode because > of deficiences in web browsers. > > Most cases I know for use of what you refer to as hacks did > not occur specifically because of lack of support of language > in Unicode. It came as a specific consequence of lack of > support in web browsers. > > Lets be honest here. It is easier to get unicode to add > support than it is to get web browsers to add support.
Received on Sunday, 31 August 2014 23:31:30 UTC