- From: Jon Hanna <jon@hackcraft.net>
- Date: Fri, 28 Jan 2005 14:05:58 -0000
- To: "'by way of Martin Duerst <duerst@w3.org>'" <Laurent_Martin@pch.gc.ca>, <www-international@w3.org>
My encoding type is for now > "iso-8859-1" but I > also plan to switch to "UTF-8" sometimes. > > Meanwhile, I'm restricted to lower 127 ASCII value within the > HTML code and > use the &***; for equivalent. But within my personnal comment > tag into the > code, I wrote in french and this contain upper 127 ASCII You can use any character in the encoding used that is contained within the UCS, and that means any character in ISO 8859-1 and UTF-8. I'm guessing you're restricting yourself to the US-ASCII range so that the same file will work with a reported encoding of either iso-8859-1 or utf-8 without changes. Your problem therefore could be caused by using octet sequences that are not valid in utf-8 when the file is read as utf-8 - after all the comments are still part of the text file. If this is indeed the case you could try editing the files as utf-8 since all utf-8 sequences are legal iso-8859-1 (albeit possibly containing control characters and definitely containing nonsense). Really though, I'd recommend just fully use the encoding you are currently using (iso 8859-1) and when you come to move to utf-8 it is relatively easy to re-encode them all as utf-8 (writing a program that re-encodes every file in a folder, or every file with a given extension, etc. is pretty easy). Regards, Jon Hanna Work: <http://www.selkieweb.com/> Play: <http://www.hackcraft.net/> Chat: <irc://irc.freenode.net/selkie>
Received on Friday, 28 January 2005 14:06:05 UTC