W3C home > Mailing lists > Public > www-tag@w3.org > March 2004

Re: Reviewed charmod fundamentals

From: Tim Bray <tbray@textuality.com>
Date: Fri, 5 Mar 2004 15:08:32 -0800
Message-Id: <0606CBCE-6EFA-11D8-95ED-000A95A51C9E@textuality.com>
Cc: www-tag@w3.org <www-tag@w3.org>
To: Tim Bray <tbray@textuality.com>
>  http://www.w3.org/TR/2004/WD-charmod-20040225
> I'm sending a bunch of corrections but almost all are editorial, or 
> minor errors of fact, and not worthy of the TAG's time.  I really only 
> found one thing

Now that I sent 'em off the feedback address, I changed my mind and 
think that there may be two more issues in here with architectural 


C016   [S]   When  designing a new protocol, format or API, 
specifications  SHOULD mandate a unique character encoding.

This is controversial.  I think in general this is reasonable, with the 
single exception of doing what XML did and blessing both UTF-8 and 
UTF-16.  The problem with a single encoding is that it forces people to 
choose between being Java/C# friendly (UTF-16) and C/C++ friendly 
(UTF-8).  Later on, you in fact seem to agree with this point.  
Furthermore it's trivially easy to distinguish between UTF-8 and UTF-16 
if you specify a BOM.  But I think that if I were defining the next CSS 
or equivalent I'd like to be able to say "UTF-8 or UTF-16" without 
feeling guilty.


I don't see anywhere that it recommends that if you're using UTF-16 you 
always use a BOM, and that seems like a basic good practice, 
particularly if you're going to allow either UTF8 or UTF-16.

Received on Friday, 5 March 2004 18:08:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:41 UTC