W3C home > Mailing lists > Public > public-evangelist@w3.org > September 2003

RE: [SumsaultRT #212] iso-8859-1 vs. utf-8

From: Richard Ishida <ishida@w3.org>
Date: Thu, 25 Sep 2003 17:29:52 +0100
To: "'Karl Dubost'" <karl@la-grange.net>
Cc: <public-evangelist@w3.org>, "'Tristan Nitot'" <tristan@nitot.com>
Message-ID: <003901c38382$3f60c700$e801000a@w3c40upc3ma3j2>

Yep.  I think I alluded to this further down in my message.   However,
I'd like to encourage the mode of thought that it's a much better plan
to try and find a utf-8 capable editor than to just fall back on the
entities.

In my experience, many English speakers don't think this is a big deal,
but as Tristan said, it can seriously affect readability and
maintainability of the source in a language like French (not to mention
Chinese or Russian, or even Czech [see below]). As Tristan said using
&eacute; is to be avoided if at all possible. 

RI


Here's an example of Czech text where accented characters use NCRs.
It's almost impossible to read.

Jako efektivn&#x115;j&#x161;&#xED; se n&#xE1;m jev&#xED;
po&#x159;&#xE1;d&#xE1;n&#xED; tzv. Road Show prost&#x159;ednictv&#xED;m
na&#x161;ich autorizovan&#x1FD;ch dealer&#x16F; v &#x10C;ech&#xE1;ch a
na Morav&#x11B;, kter&#xE9; prob&#x11B;hnou v pr&#x16F;b&#x16F;hu
z&#xE1; &#x159;&#xED; a &#x159;&#xED;jna.



============
Richard Ishida
W3C

contact info: http://www.w3.org/People/Ishida/ 

http://www.w3.org/International/ 
http://www.w3.org/International/geo/ 

See the W3C Internationalization FAQ page
http://www.w3.org/International/questions.html



> -----Original Message-----
> From: Karl Dubost [mailto:karl@la-grange.net] 
> Sent: 25 September 2003 16:58
> To: ishida@w3.org
> Cc: public-evangelist@w3.org; 'Tristan Nitot'
> Subject: Re: [SumsaultRT #212] iso-8859-1 vs. utf-8
> 
> 
> 
> Le jeudi, 25 sep 2003, à 06:43 America/Montreal, Richard 
> Ishida a écrit 
> :
> >>
> >> UTF-8 is quite universal, but you'll have to use html 
> entities (such 
> >> as "&eacute;" for "é") instead of accented (non-ascii) characters. 
> >> This
> 
> >
> > Hmm.  I think you somehow have this the wrong way round.  
> UTF-8 means 
> > you have no need to use character entities, since it covers 
> the whole 
> > Unicode repertoire.  As you say, its because ISO 8859-1 only covers
> 
> :) let's clear up a bit. Both of you are right, in some context.
> 
> * If you have an editor (authoring tool) which can NOT input utf-8 in 
> your text and you still want to use utf-8 for your document. You can 
> use this low tech method which is
> 	é -> &eacute; for example, so you will have only 
> us-ascii characters 
> in your document and us-ascii is a subset of utf-8.
> 
> * If you have an editor which can input utf-8. Just type your accents.
> 
> 
> BTW, it would be good that someone on the mailing-list makes 
> a list of 
> all editors and their support of utf-8.
> 
> 
> 
> 
Received on Thursday, 25 September 2003 12:30:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 30 October 2009 16:36:53 GMT