RE: ASCII value upper 127 in  from Jon Hanna on 2005-01-28 (www-international@w3.org from January to March 2005)

From: Jon Hanna <jon@hackcraft.net>
Date: Fri, 28 Jan 2005 14:05:58 -0000
To: "'by way of Martin Duerst <duerst@w3.org>'" <Laurent_Martin@pch.gc.ca>, <www-international@w3.org>
Message-Id: <20050128140602.95CC7AE854F6@postie2.hosting365.ie>

My encoding type is for now 
> "iso-8859-1" but I 
> also plan to switch to "UTF-8" sometimes.
> 
> Meanwhile, I'm restricted to lower 127 ASCII value within the 
> HTML code and 
> use the &***; for equivalent. But within my personnal comment 
> tag into the 
> code, I wrote in french and this contain upper 127 ASCII 

You can use any character in the encoding used that is contained within the
UCS, and that means any character in ISO 8859-1 and UTF-8. I'm guessing
you're restricting yourself to the US-ASCII range so that the same file will
work with a reported encoding of either iso-8859-1 or utf-8 without changes.

Your problem therefore could be caused by using octet sequences that are not
valid in utf-8 when the file is read as utf-8 - after all the comments are
still part of the text file.

If this is indeed the case you could try editing the files as utf-8 since
all utf-8 sequences are legal iso-8859-1 (albeit possibly containing control
characters and definitely containing nonsense).

Really though, I'd recommend just fully use the encoding you are currently
using (iso 8859-1) and when you come to move to utf-8 it is relatively easy
to re-encode them all as utf-8 (writing a program that re-encodes every file
in a folder, or every file with a given extension, etc. is pretty easy).

Regards,
Jon Hanna
Work: <http://www.selkieweb.com/>
Play: <http://www.hackcraft.net/>
Chat: <irc://irc.freenode.net/selkie>

Received on Friday, 28 January 2005 14:06:05 UTC