W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2009

RE: Feedback on Unicode Technical Standard #46

From: Richard Ishida <ishida@w3.org>
Date: Thu, 29 Oct 2009 20:40:38 -0000
To: 'Mark Davis ☕' <mark@macchiato.com>
Cc: <public-i18n-core@w3.org>
Message-ID: <008001ca58d8$13989f30$3ac9dd90$@org>
[2] I found it occasionally confusing that the word 'map' is used in different ways.  In some places it is used to mean 'normalize', and in others 'convert to punycode'.   Eg.

 

"Transforming (mapping) a Unicode string to remove case and other variant differences. " [I would prefer '(normalizing')]

 

"Both map a Unicode for a domain name in a URL (like  http://öbb.at) to the Punycode version "

 

There are some places where it isn't clear in the text whether the issue centres around the normalization process or the mapping to punycode.  I'd like to see different terms used for these operations.

 

 

[3] "Both map a Unicode for a domain name in a URL (like  http://öbb.at) to the Punycode version (like http://xn--bb-eka.at). "

=> "Both map a non-ASCII label for a domain name ..."

 

I think that is reasonable, but we have to stay away from "normalization", since that is a loaded term in a Unicode context.

 

<RI> 

I think this is the answer to point [2]. (Did you see point [3]?)  

 

Aside { I have reservations about avoiding the word normalization because Unicode uses it for a particular type of normalization – it's a word that needs to mean more than Unicode in the real world and we shouldn't start spoiling that. The issue is the converse of the use of Xerox.  Unicode normalization should be clearly distinguished from normalization as a general concept in a document that is not specifically about Unicode normalization. }

 

But having vented my aside, I was prepared for that ;-)  I was thinking that perhaps we could consistently refer to the Unicode->Punycode transformation as a 'conversion' rather than a 'mapping' ?  That may help.

</RI>

 

 

 

<RI>

Actually there was another comment that I forgot to make…  

 

C1 says "Given a version of Unicode and a Unicode String…" It wasn't clear to me why the implementor needs to worry about the Unicode version info.

</RI>

 

 

 

From: mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: 29 October 2009 18:33
To: Richard Ishida
Cc: public-i18n-core@w3.org
Subject: Re: Feedback on Unicode Technical Standard #46

 

Thanks for the feedback.

Mark




[2] I found it occasionally confusing that the word 'map' is used in different ways.  In some places it is used to mean 'normalize', and in others 'convert to punycode'.   Eg.

"Transforming (mapping) a Unicode string to remove case and other variant differences. " [I would prefer '(normalizing')]

"Both map a Unicode for a domain name in a URL (like  http://öbb.at <http://xn--bb-eka.at> ) to the Punycode version "

There are some places where it isn't clear in the text whether the issue centres around the normalization process or the mapping to punycode.  I'd like to see different terms used for these operations.


[3] "Both map a Unicode for a domain name in a URL (like  http://öbb.at <http://xn--bb-eka.at> ) to the Punycode version (like http://xn--bb-eka.at). "
=> "Both map a non-ASCII label for a domain name ..."


I think that is reasonable, but we have to stay away from "normalization", since that is a loaded term in a Unicode context. 

 

 


[4] " Map  http://ÖBB.at <http://xn--bb-eka.at>  to  http://øbb.at <http://xn--bb-kka.at>  "
I think the ø should be ö


Got it.


[5] "For more information, see the Mapping document in [IDNA2008]."

Please provide a more direct link.  I couldn't find this quickly.


The links for those documents are not final yet. 


[6] "IDNA2008 does define a particular mapping, but it is not normative, and does not attempt to be compatible with IDNA2003."

My initial reaction to reading that is that this document ought to discuss how that mapping is different from that proposed in this document, and why this is better.


The compatibility is the reason that it is better. If that isn't clear from other parts of the document, then we need at least some pointers.


[7] "The label must not begin with a combining mark, that is: [:gc=M:]"

The notation at the end of the sentence has not been introduced, and for some will be obscure.  I suggest replacing it with text for this section.  Same for "[:Join_Control:] "


I'll move the notation info up.  


[8] "Major improvement in making process of updating to future Unicode versions mostly-automatically"
=> ".. mostly automatic"


got it. 


[9] I think it would be useful to have a section in the earlier part of the document that explains how subtractions are dealt with, and what are the implications of that.  There is a tangential reference in the faq to symbols, but not much else, as far as I can see.


I'm not quite clear what you mean by this. 



Hope that helps,


Yes, thanks!
 

RI


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/







 
Received on Thursday, 29 October 2009 20:41:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 29 October 2009 20:41:05 GMT