Re: UTF-8 vs. ISO Latin 1


This setting (by default  files are served as utf-8 ) only applies to 
/TR web space.

In our web space, for example in the WebCGM web space, this setting does 
not apply.

You can stick with 8859-1 for your editing and publish as is in the 
WebCGM WG space

I will do the necessary when publishing the document in TR Space.

FYI, Public Working Draft (30 January 2009) was using  encoding "8859-1"

and the second Last Call Working Draft (04 June 2009) is using  encoding 

Note that the Technical Report Publication Policy (Pubrules) does not 
request to useUTF-8 for TR publications.

Now Why UTF-8 and not UTF-16?  I don't know who decided this and why 
UTF-8, probably the Systeam Team and the Webmaster.



Lofton Henderson wrote:
> Hi Thierry,
> I would like to ask you a question about the below message from 
> Philippe.  You might recall it, tho' it happened when you were away.
> Specifically:  his comments about the utf-8 change.  I have verified 
> that the only differences between my last /current-editor/ text and his 
> published 2nd LCWD are indeed around 8859-1 and utf-8.  Specifically, 
> the XML declaration and the <meta content...> element, *AND*, everywhere 
> that there is a character from 8859 RHS it has been mapped to UTF-8.  
> (Mostly the European names in WebCGM21-Intro.html.)
> He said, "you don't necessarily need to change your copy, but ... 
> suggest ... some point in the future".
> I would like to stick with 8859-1 for now, for my editing.  Two reasons:
> 1.)  I'd like to verify that UTF-8 won't create an issue when I move the 
> text back to the OASIS servers.  Else it would be more changes to make 
> the move, whereas now it is just a swapping of style sheets and cover page.
> 2.)  I have found at least one tool that I use (an old, free text editor 
> called PFE that has many nice features) that does not handle UTF-8 
> nicely.  (It will read and save the multi-byte UTF-8 characters okay, 
> but I cannot meaningfully edit them.  It handles all of 8859-1 just fine.)
> QUESTION.  So is it okay for me to stick with 8859-1 for my editing work 
> and the /current-editor/ directory?  It looks like Philippe has a quick 
> automated process to make the change before publication, if that is 
> necessary.  (But note that we previously published webcgm21 twice in W3C 
> as 8859-1, and had no problems).
> (Another question.  Why UTF-8 and not UTF-16?  The Latin1/UTF problem 
> goes away for UTF-16.  Is it because UTF-8 is only one octet for ASCII, 
> whereas UTF-16 is two octets for both LHS (ascii) and RHS of 8859-1?)
> Cheers,
> -Lofton.
>> Subject: Re: Almost ready to publish 2nd LCWD
>> From: Philippe Le Hegaret <>
>> To: Lofton Henderson <>
>> Cc: Thierry Michel <>
>> On Wed, 2009-05-27 at 13:57 -0600, Lofton Henderson wrote:
>> > Philippe,
>> >
>> > The zip file is there:
>> > 
>> >
>> > That should be all you need for publication.
>> I checked the document and everything seems fine. I did change the
>> encoding of the files to use utf-8 instead of iso latin 1 (our /TR
>> directory has a weird configuration by default where the files are
>> served as utf-8 and iso latin 1 depending on how one access them). You
>> don't necessarily need to change your copy but, if you have the
>> opportunity at some point in the future, I suggest that you switch to
>> utf-8 as well. I used a non-intrusive method to change the encoding, to
>> guarantee that no other change would be applied. I made also a very
>> minor fix for the subject parameter in the mailing list link in the
>> status.
>> I attached a diff file for your eyes in case you wish to check the
>> changes. The diff contains both iso latin 1 and utf 8 characters so
>> don't be surprised if the utf-8 characters don't appear alright. I
>> assure you they're fine in the final version.
>> So, unless Thierry knows something that I don't, we should be all set
>> for the documents.
>> Philippe

Received on Wednesday, 19 August 2009 09:46:42 UTC