- From: John C Klensin <klensin@jck.com>
- Date: Thu, 16 Jan 2014 14:04:21 -0500
- To: John Cowan <cowan@mercury.ccil.org>
- cc: Gervase Markham <gerv@mozilla.org>, "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, uri@w3.org, IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
--On Thursday, January 16, 2014 12:55 -0500 John Cowan <cowan@mercury.ccil.org> wrote: > John C Klensin scripsit: > >> The distinction between mapping for something typed or >> otherwise specified directly by the user and a mapping >> requirement for domains or URLs/URIs stored in documents, >> search or DNS examination programs, and the like keeps >> getting lost in this set of discussions, but is really, >> seriously, important. > > I'm not so sure. In the end, URLs in documents tend to be > typed by the user too, it's just a different kind of user. But there is always something of a transformation process to get it into the document. For example, users don't type UTF-8, they type stuff that gets mapped via various procedures into UTF-8 or something else. > You could argue that document editors should do the mapping > themselves, but then you're back to the old stand. Maybe I am "back to the old stand" -- I'm just trying to explain a perspective that has some history of being useful. That history, for me and even for i18n issues specifically, extends back to the last 60s, which is, indeed. very "old stand". However, I think there are ultimately two cases as far as document editors are concerned: (1) The mapping that might be used is trivial -- either the ASCII cases or things like full-width East Asian character (many ASCII characters fall into this category only if one is willing to assume that, e.g., "A" always means/maps to "a" rather than any of the decorated lower-case forms that, in various localized writing system contexts, lose their decorations when being mapped to upper case. For most or all of these cases, it ought to be trivial for document editors to simply enter the canonical forms. If there is some reasons why they don't and mapping is needed, that is ok too. (2) The more complex cases in which mappings can turn a character into a non-obvious alternative. For these cases, the document author/ editor better know what she is doing. The reality is that those mappings may be done or not done, unpredictably and depending on environment and circumstances and the decisions may have inadvertent blocking side-effects. If, for example, a label that contains ZWNJ is registered and (as UTS46 and other things recommend as a reasonable option) the same string with ZWNJ is blocked. then an IDN resolving engine that maps ZWNJ to nothing prevents use of the name (similarly for sharp-S, etc.). For these cases, if the document editors knows what is going on, then specifying exactly what is intended (in A-label or at least U-label form) is the best and least risky thing she can do. We betray the trust that implies --trust that she is, in fact, smart enough to know what she is doing-- if we second-guess here canonical strings by mapping them to something else. Conversely, if the document editor doesn't have a clue, it is not clear to me that we are doing either him or his users/readers a favor by encouraging ambiguity in an identifier that they have been told is not ambiguous. At least that is what it looks like from here. YMMD. john
Received on Thursday, 16 January 2014 19:04:51 UTC