- From: John C Klensin <klensin@jck.com>
- Date: Thu, 16 Jan 2014 12:24:57 -0500
- To: "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, uri@w3.org
- cc: IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
Hi. With the understanding that I'm not really saying anything that Mark, Andrew, and a few others haven't said but that a different perspective may be worthwhile... (1) If only because there are other protocols and actors in this drama than web browsers, this continuing discussion leads us in the direction of having four "standards": (i) IDNA2008, plus or minus application-instance-specific or platform-specific use of RFC 5895. (ii) IDNA2003 (iii) IDNA2008 + the mapping (as distinct from compatibility) part of UTR46 (iv) IDNA2003 + Unspecified adaptations for Unicode versions later than 32 + UTR 46 Given that there are non-web i18n applications --notably the now-deploying email specs and the work on various security-related and other specs in PRECIS -- simply having four "standards" is not going to be popular with users who be certainly be astonished when what they see as "the same thing" behaves differently in different contexts. IMO, the only thing that has saved us from an explosion about that so far is that the significantly different behaviors among the above are mostly edge cases. The important difference between case (iv) and the others is that, as others have pointed out, case (iv) is not one case and no one actually knows what it actually means. Yet, as I understand it, that is precisely what Anne is proposing to specify. In terms of a standard, that comes pretty close to "Unicode 3.2 is standardized and we hope that no properties of it will change; for characters included in later versions of Unicode, do what you like". I can't think of anything kind to say about that. As to the first three, I remain concerned that there are a few characters that are PVALID (or CONTEXTJ) under IDNA2008 that UTS46 essentially prohibits using in any separate and distinct ways. There is no doubt in my mind that the maximally conservative path is precisely that prohibition, preferably enforced by registry rules that prevent separate registration of both the IDNA2008-permitted character and whatever it would be mapped to under IDNA2008 or UTS46. But those who decide to go with that plan need to recognize two things, for better or worse: (i) There are hundreds of thousands, if not millions, of separately-administered and controlled registries in the DNS. If the criterion for getting rid of mappings that preempt the use of the relevant IDNA2008-permitted characters becomes "all DNS registries prohibit independent registration of both them and the characters that formerly mapped to them" (or even "proof that most registries prohibit...", then anyone who believes that point is different from "never" is deluding themselves. Worse, each succeeding year in which web page authors believe that they can and should depend on the mappings being present makes discontinuing those mappings (ever) in browsers less possible. (ii) Some people feel very strongly about the independent availablity of those characters and, regardless of what "we" might believe, do not see confusion or conflicts within the context of their languages (or, e.g., "their" new gTLDs). We also know that disagreements about how a particular language is represented in Unicode have led, in a few places, to very serious discussions of legislative or judicial action against the Unicode Consortium or banning the use of Unicode in those areas. Fortunately for those of us who favor open international standards, those efforts have never gone anywhere. But, especially where there are conflicting standards, I see a real possibility of some government taking the position that a browser that de facto prohibits characters that they think necessary and that are allowed by one of the standards is anti-competitive and/or insulting to the national culture. If the country or region involved were in any way economically or culturally significant, I'd assume that browser vendors -- especially those whose existence depends on either market share in relevant areas or on the perception that they are "good guys" that leads to contributions, would rapidly discover a need to either be compatible with the the standard that supported the relevant national characters or to got to the considerable expense and aggravation of creating a one-off implementation that would accommodate the national demands. --------------- FWIW, I continue to believe that the right way forward is one that is largely consistent with all of the present approaches in the long run. It would be something like: (1) Advise web page authors and tool-builders that hrefs, things that map into them (e.g., IRIs), or equivalent that depend on mappings are just a bad idea, have been a bad idea since IDNA2003 was introduced, and that uses of them should be revised out of existence as quickly as possible. In other words, unambiguously deprecate the practice without necessarily stopping uses of it from working. (2) Advise browser implementers to support a pair of "no mapping" switches, one for user input and the other for hrefs and equivalent. Ideally, those switched should have values of "yes, map", "no, don't map", and "warn in cases where mapping is about to be applied and then do it". By default, the "user input" one should start at "yes" and the "href" one should start with "warm" with the expectation of possibly migrating the "no" over time, but it should be possible for users and those specifying system configurations or national localizations to set them differently. That combination allows everyone to move forward and lets browsers be agile relative to evolving usage and demands. For example, if a government did impose a requirement wrt independent use of characters in a particular language, that could be handled as a localization matter rather than a browser revision, regardless of what one thought of the merits of their position. People working with sufficiently old HTML files could set switches appropriately so that those pages would continue to work in their environments. And it would allow us to start moving away from the "four competing standards" situation because it really does provide the migration path that we don't have now (and that has led to various versions of what some of us describe as "IDNA2003, more or less, forever". best, john
Received on Thursday, 16 January 2014 17:25:24 UTC