W3C home > Mailing lists > Public > public-i18n-geo@w3.org > September 2004

Feedback: Using non-ASCII characters in Web addresses

From: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
Date: Wed, 1 Sep 2004 14:47:08 +0100
Message-ID: <418B7E44473AC34488C9E730D09FF3CF027F8DE4@bbcxue204.national.core.bbc.co.uk>
To: <public-i18n-geo@w3.org>

(First) 'Step by step example' & 'Overview' sections duplicate a bit. What I want to know is:
1) why
2) how it works technically
3) relationship to URI
4) does it work at all points, eg, UA, domain reg, etc
Could be stronger & more direct suggestion to register two names:
"In practise, it would make sense to register two names for your domain. One in your native script, and one using just the regular Latin characters. The latter will be more memorable and easier to type for people who do not read and write your language. For example, as a minimum, you could additionally register a transcription of the Japanese in Latin script, such as the following:"
.jp is lower case to start with.
"Note how the ASCII characters 'JP' are lowercased, but otherwise just passed through ."
Which version of IE? 
IE 5.0, 5.5, & 6.0 according to download page (http://www.idnnow.com/index.jsp)
"The conversion process was already supported natively in Mozilla 1.4 / Netscape 7.1, and Opera 7.2. It works in Internet Explorer if you download a plug-in (for example, this one)."
Worked for me with IE 6.0
I think 'Additional problems' section would sit better in a technical how-it-works section, saying that by escaping non-ASCII characters can be represented without IRIs, but that this is dependent on the encoding in the file system, ie, in the example case, Shift-JIS or UTF-8.
The first line of the current 'Additional problems' section, ie:
"An IRI is defined as a sequence of characters, not bytes - so the fact that the IRI might be represented in documents or protocols using different encodings is irrelevant."
Does not go to the heart of one problem; it is the reason why the escape solution can be a problem. The additional problem being human readability and memorability. But I think it's useful to include the statement that a URI & IRI is represented as a sequence of characters, not as a sequence of octets. 
What is the relationship between URI & IRI?

http://www.bbc.co.uk/ - World Wide Wonderland

This e-mail (and any attachments) is confidential and may contain
personal views which are not the views of the BBC unless specifically
If you have received it in error, please delete it from your system. 
Do not use, copy or disclose the information in any way nor act in
reliance on it and notify the sender immediately. Please note that the
BBC monitors e-mails sent or received. 
Further communication will signify your consent to this.
Received on Wednesday, 1 September 2004 13:47:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:01 UTC