Re: Feedback: Using non-ASCII characters in Web addresses from Najib Tounsi on 2004-09-09 (public-i18n-geo@w3.org from September 2004)

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Thu, 09 Sep 2004 13:51:45 +0000
To: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
CC: public-i18n-geo@w3.org
Message-ID: <41405FF1.80801@emi.ac.ma>
Just about ASCII CHARACTERS.

It may be worth to specify what do the expressions "ASCII CHARACTERS" 
and "NON-ASCII CHARACTERS" cover?
With the usage, ASCII may refer to the US-ASCII CHARACTER [00..7F] (only 
7bits) or the PC-ASCII-CHARACTER (accent extension (the whole 8 bits)). 
I speak as a french language and thus an AZERTY keyboard user.
Example:
&eacute; (é) is coded
-   'E9' in western-ISO-8859-1 (PC-ASCII extension)
-   'C3 A9' in UTF-8
Which one is NON-ASCII  'E9', 'C3 A9' or both ?


Btw: Is there  a difference between Octet and Byte?


Deborah Cawkwell wrote:

>USING NON-ASCII CHARACTERS IN WEB ADDRESSES/AN INTRODUCTION TO MULTILINGUAL WEB ADDRESSES (ROUGH DRAFT !)
>http://www.w3.org/International/articles/idn-and-iri/
>------------------
>(First) 'Step by step example' & 'Overview' sections duplicate a bit. What I want to know is:
>1) why
>2) how it works technically
>3) relationship to URI
>4) does it work at all points, eg, UA, domain reg, etc
>------------------
>Could be stronger & more direct suggestion to register two names:
>"In practise, it would make sense to register two names for your domain. One in your native script, and one using just the regular Latin characters. The latter will be more memorable and easier to type for people who do not read and write your language. For example, as a minimum, you could additionally register a transcription of the Japanese in Latin script, such as the following:"
>------------------
>.jp is lower case to start with.
>"Note how the ASCII characters 'JP' are lowercased, but otherwise just passed through ."
>------------------
>Which version of IE? 
>IE 5.0, 5.5, & 6.0 according to download page (http://www.idnnow.com/index.jsp)
>"The conversion process was already supported natively in Mozilla 1.4 / Netscape 7.1, and Opera 7.2. It works in Internet Explorer if you download a plug-in (for example, this one)."
>Worked for me with IE 6.0
>------------------
>I think 'Additional problems' section would sit better in a technical how-it-works section, saying that by escaping non-ASCII characters can be represented without IRIs, but that this is dependent on the encoding in the file system, ie, in the example case, Shift-JIS or UTF-8.
>The first line of the current 'Additional problems' section, ie:
>"An IRI is defined as a sequence of characters, not bytes - so the fact that the IRI might be represented in documents or protocols using different encodings is irrelevant."
>Does not go to the heart of one problem; it is the reason why the escape solution can be a problem. The additional problem being human readability and memorability. But I think it's useful to include the statement that a URI & IRI is represented as a sequence of characters, not as a sequence of octets. 
>What is the relationship between URI & IRI?
>------------------
>
>http://www.bbc.co.uk/ - World Wide Wonderland
>
>This e-mail (and any attachments) is confidential and may contain
>personal views which are not the views of the BBC unless specifically
>stated.
>If you have received it in error, please delete it from your system. 
>Do not use, copy or disclose the information in any way nor act in
>reliance on it and notify the sender immediately. Please note that the
>BBC monitors e-mails sent or received. 
>Further communication will signify your consent to this.
>  
>


-- 
Najib TOUNSI (mailto:tounsi@w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30
Received on Thursday, 9 September 2004 14:28:47 UTC