W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2009

Re: Feedback on Unicode Technical Standard #46

From: Mark Davis ☕ <mark@macchiato.com>
Date: Thu, 29 Oct 2009 14:20:32 -0700
Message-ID: <30b660a20910291420m314994c9y1f9b49f632ee58c1@mail.gmail.com>
To: Richard Ishida <ishida@w3.org>
Cc: public-i18n-core@w3.org
Mark


On Thu, Oct 29, 2009 at 13:40, Richard Ishida <ishida@w3.org> wrote:

>  [2] I found it occasionally confusing that the word 'map' is used in
> different ways.  In some places it is used to mean 'normalize', and in
> others 'convert to punycode'.   Eg.
>
>
>
> "Transforming (mapping) a Unicode string to remove case and other variant
> differences. " [I would prefer '(normalizing')]
>
>
>
> "Both map a Unicode for a domain name in a URL (like  http://öbb.at<http://xn--bb-eka.at>)
> to the Punycode version "
>
>
>
> There are some places where it isn't clear in the text whether the issue
> centres around the normalization process or the mapping to punycode.  I'd
> like to see different terms used for these operations.
>
>
>
>
>
> [3] "Both map a Unicode for a domain name in a URL (like  http://öbb.at<http://xn--bb-eka.at>)
> to the Punycode version (like http://xn--bb-eka.at). "
>
> => "Both map a non-ASCII label for a domain name ..."
>
>
>
> I think that is reasonable, but we have to stay away from "normalization",
> since that is a loaded term in a Unicode context.
>
>
>
> <RI>
>
> I think this is the answer to point [2]. (Did you see point [3]?)
>
>
>
> Aside { I have reservations about avoiding the word normalization because
> Unicode uses it for a particular type of normalization – it's a word that
> needs to mean more than Unicode in the real world and we shouldn't start
> spoiling that. The issue is the converse of the use of Xerox.  Unicode
> normalization should be clearly distinguished from normalization as a
> general concept in a document that is not specifically about Unicode
> normalization. }
>
>
>
> But having vented my aside, I was prepared for that ;-)  I was thinking
> that perhaps we could consistently refer to the Unicode->Punycode
> transformation as a 'conversion' rather than a 'mapping' ?  That may help.
>

I think we can definitely improve the consistency of the terms. Convert,
transform, map are all available, and we could use them more consistently
for different kinds of operations. This is probably best for the ed
committee to wrestle with.

BTW, I submitted the working draft to the UTC, so you can take a look at
that for further comments.



> </RI>
>
>
>
>
>
>
>
> <RI>
>
> Actually there was another comment that I forgot to make…
>
>
>
> C1 says "Given a version of Unicode and a Unicode String…" It wasn't clear
> to me why the implementor needs to worry about the Unicode version info.
>
> </RI>
>

The reason is that you'll get different answers. That is, with a newer
version of unicode, some characters that were formerly unassigned will
become valid, mapped, or ignored.

>
>
>
>
>
>
> *From:* mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] *On
> Behalf Of *Mark Davis ?
> *Sent:* 29 October 2009 18:33
> *To:* Richard Ishida
> *Cc:* public-i18n-core@w3.org
> *Subject:* Re: Feedback on Unicode Technical Standard #46
>
>
>
> Thanks for the feedback.
>
> Mark
>
>
> [2] I found it occasionally confusing that the word 'map' is used in
> different ways.  In some places it is used to mean 'normalize', and in
> others 'convert to punycode'.   Eg.
>
> "Transforming (mapping) a Unicode string to remove case and other variant
> differences. " [I would prefer '(normalizing')]
>
> "Both map a Unicode for a domain name in a URL (like  http://öbb.at<http://xn--bb-eka.at>)
> to the Punycode version "
>
> There are some places where it isn't clear in the text whether the issue
> centres around the normalization process or the mapping to punycode.  I'd
> like to see different terms used for these operations.
>
>
> [3] "Both map a Unicode for a domain name in a URL (like  http://öbb.at<http://xn--bb-eka.at>)
> to the Punycode version (like http://xn--bb-eka.at). "
> => "Both map a non-ASCII label for a domain name ..."
>
>
> I think that is reasonable, but we have to stay away from "normalization",
> since that is a loaded term in a Unicode context.
>
>
>
>
>
>
> [4] " Map  http://ÖBB.at <http://xn--bb-eka.at> to  http://øbb.at<http://xn--bb-kka.at>"
> I think the ø should be ö
>
>
> Got it.
>
>
> [5] "For more information, see the Mapping document in [IDNA2008]."
>
> Please provide a more direct link.  I couldn't find this quickly.
>
>
> The links for those documents are not final yet.
>
>
> [6] "IDNA2008 does define a particular mapping, but it is not normative,
> and does not attempt to be compatible with IDNA2003."
>
> My initial reaction to reading that is that this document ought to discuss
> how that mapping is different from that proposed in this document, and why
> this is better.
>
>
> The compatibility *is* the reason that it is better. If that isn't clear
> from other parts of the document, then we need at least some pointers.
>
>
> [7] "The label must not begin with a combining mark, that is: [:gc=M:]"
>
> The notation at the end of the sentence has not been introduced, and for
> some will be obscure.  I suggest replacing it with text for this section.
>  Same for "[:Join_Control:] "
>
>
> I'll move the notation info up.
>
>
> [8] "Major improvement in making process of updating to future Unicode
> versions mostly-automatically"
> => ".. mostly automatic"
>
>
> got it.
>
>
> [9] I think it would be useful to have a section in the earlier part of the
> document that explains how subtractions are dealt with, and what are the
> implications of that.  There is a tangential reference in the faq to
> symbols, but not much else, as far as I can see.
>
>
> I'm not quite clear what you mean by this.
>
>
>
> Hope that helps,
>
>
> Yes, thanks!
>
>
> RI
>
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/International/
> http://rishida.net/
>
>
>
>
>
>
>
Received on Thursday, 29 October 2009 21:21:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 29 October 2009 21:21:22 GMT