RE: [SumsaultRT #212] iso-8859-1 vs. utf-8

> -----Original Message-----
> From: public-evangelist-request@w3.org 
> [mailto:public-evangelist-request@w3.org] On Behalf Of Tristan Nitot
> Sent: 25 September 2003 07:51
> To: public-evangelist@w3.org
> Subject: Re: [SumsaultRT #212] iso-8859-1 vs. utf-8
> 
> 
> 
> 
> 
> Mark Stosberg wrote:
> 
> >>>provided that you serve them as text/html (but this applies
> >>>trivially to application/xhtml+xml), you can add (or amend an 
> >>>existing entry) the following directive:
> >>>AddType text/html;charset=iso-8859-1	html
> >>>      
> >>>
> >
> >Thanks for the responses. So both "iso-8859-1" and "utf-8" have been 
> >recommended for us as the default character encoding type. 
> Is there a 
> >particular rule of thumb about which to use, or is the 
> central issue:  
> >"either one is better than none"?
> >  
> >
> each directive has good and bad side.
> 
> UTF-8 is quite universal, but you'll have to use html 
> entities (such as 
> "é" for "é") instead of accented (non-ascii) characters. This 
> makes accented text painful to read and edit.
> using 8859-1 allows you to insert accented characters in your html, 
> provided they belong to the latin-1 charset . This is the 


Hmm.  I think you somehow have this the wrong way round.  UTF-8 means
you have no need to use character entities, since it covers the whole
Unicode repertoire.  As you say, its because ISO 8859-1 only covers
characters used in Western European languages (plus a few other simple
Latin langs), that if you wanted to represent, say, the Czech c caron,
then you'd have to use a Numeric Character Reference or entity.

For an example, see
http://people.w3.org/rishida/scripts/samples/wrapping-no-dhtml.html (10
scripts on one page and no NCRs or entities other than &)

You do, of course, need an editor that supports utf-8, but that's not
difficult to find these days.

You are right, though, when you say that it's better not to use
character entities, but rather actual codes.



> solution that 
> I use, because I write in French and in English only. If I 
> wanted to use 
> characters outside of the iso-8859-1 charset, (e.g. for 
> czech) I'd have 
> to use html entities or switch to UTF-8.
> 
> 
> --Tristan
> PS: I'm not an I18n guru, so feel free to correct me if I'm wrong.
> 
> -- 
> Contributeur Mozilla et OpenWebGroup
> http://mozilla.org/    : Efficiency, safety and liberty for browsing.
> http://openweb.eu.org/ : pour apprendre les standards. 
> http://standblog.com/  : un blog sur les standards.
> 
http://pompage.net/    : de saines lectures à propos des standards.



============
Richard Ishida
W3C

contact info: http://www.w3.org/People/Ishida/ 

http://www.w3.org/International/ 
http://www.w3.org/International/geo/ 

See the W3C Internationalization FAQ page
http://www.w3.org/International/questions.html

Received on Thursday, 25 September 2003 06:44:09 UTC