W3C home > Mailing lists > Public > public-i18n-geo@w3.org > September 2004

RE: Feedback: Using non-ASCII characters in Web addresses

From: Richard Ishida <ishida@w3.org>
Date: Tue, 21 Sep 2004 18:50:39 +0100
To: "'Najib Tounsi'" <ntounsi@emi.ac.ma>, "'Deborah Cawkwell'" <deborah.cawkwell@bbc.co.uk>
Cc: <public-i18n-geo@w3.org>
Message-Id: <20040921175039.D2BA34F21A@homer.w3.org>

Hi Najib,

See notes below...

> -----Original Message-----
> From: public-i18n-geo-request@w3.org 
> [mailto:public-i18n-geo-request@w3.org] On Behalf Of Najib Tounsi
> Sent: 09 September 2004 14:52
> To: Deborah Cawkwell
> Cc: public-i18n-geo@w3.org
> Subject: Re: Feedback: Using non-ASCII characters in Web addresses
> 
> 
> Just about ASCII CHARACTERS.
> 
> It may be worth to specify what do the expressions "ASCII CHARACTERS" 
> and "NON-ASCII CHARACTERS" cover?

The use of ASCII is a little loose here, since as mentioned in the beginning of the article, there are slightly different specifications for appropriate character sets in URIs and Domain Name usage.


> With the usage, ASCII may refer to the US-ASCII CHARACTER 
> [00..7F] (only
> 7bits) or the PC-ASCII-CHARACTER (accent extension (the whole 
> 8 bits)).

This is usually referred to as ANSI, rather than ASCII.  ASCII is a term used to refer to a 7-bit encoding.  ANSI is the 8-bit encoding that includes accented characters.  Another name for the 8-bit ISO encoding ISO-8859-1 is Latin1.
 
> I speak as a french language and thus an AZERTY keyboard user.
> Example:
> &eacute; (é) is coded
> -   'E9' in western-ISO-8859-1 (PC-ASCII extension)
> -   'C3 A9' in UTF-8
> Which one is NON-ASCII  'E9', 'C3 A9' or both ?

Both.

hth
RI
Received on Tuesday, 21 September 2004 17:50:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:02 UTC