Re: Reviewed charmod fundamentals from Tim Bray on 2004-03-05 (www-tag@w3.org from March 2004)

From: Tim Bray <tbray@textuality.com>
Date: Fri, 5 Mar 2004 15:08:32 -0800
To: Tim Bray <tbray@textuality.com>
Cc: www-tag@w3.org <www-tag@w3.org>
Message-Id: <0606CBCE-6EFA-11D8-95ED-000A95A51C9E@textuality.com>

>  http://www.w3.org/TR/2004/WD-charmod-20040225
>
> I'm sending a bunch of corrections but almost all are editorial, or 
> minor errors of fact, and not worthy of the TAG's time.  I really only 
> found one thing

Now that I sent 'em off the feedback address, I changed my mind and 
think that there may be two more issues in here with architectural 
weight:

=======================================================

C016   [S]   When  designing a new protocol, format or API, 
specifications  SHOULD mandate a unique character encoding.

This is controversial.  I think in general this is reasonable, with the 
single exception of doing what XML did and blessing both UTF-8 and 
UTF-16.  The problem with a single encoding is that it forces people to 
choose between being Java/C# friendly (UTF-16) and C/C++ friendly 
(UTF-8).  Later on, you in fact seem to agree with this point.  
Furthermore it's trivially easy to distinguish between UTF-8 and UTF-16 
if you specify a BOM.  But I think that if I were defining the next CSS 
or equivalent I'd like to be able to say "UTF-8 or UTF-16" without 
feeling guilty.

========================================================

I don't see anywhere that it recommends that if you're using UTF-16 you 
always use a BOM, and that seems like a basic good practice, 
particularly if you're going to allow either UTF8 or UTF-16.

Attachments

application/pkcs7-signature attachment: smime.p7s

Received on Friday, 5 March 2004 18:08:37 UTC