Re: Reviewed charmod fundamentals from Jon Hanna on 2004-03-08 (www-tag@w3.org from March 2004)

From: Jon Hanna <jon@hackcraft.net>
Date: Mon, 8 Mar 2004 12:04:36 +0000
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Cc: Tim Bray <tbray@textuality.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <1078747476.404c615493343@82.195.128.192>

Quoting Elliotte Rusty Harold <elharo@metalab.unc.edu>:

> 
> At 3:08 PM -0800 3/5/04, Tim Bray wrote:
> 
> 
> >This is controversial.  I think in general this is reasonable, with 
> >the single exception of doing what XML did and blessing both UTF-8 
> >and UTF-16.  The problem with a single encoding is that it forces 
> >people to choose between being Java/C# friendly (UTF-16) and C/C++ 
> >friendly (UTF-8).  Later on, you in fact seem to agree with this 
> >point.  Furthermore it's trivially easy to distinguish between UTF-8 
> >and UTF-16 if you specify a BOM.  But I think that if I were 
> >defining the next CSS or equivalent I'd like to be able to say 
> >"UTF-8 or UTF-16" without feeling guilty.
> 
> Speaking as a Java programmer, I do not find UTF-8 to be less Java 
> friendly than UTF-16. Both UTF-8 and UTF-16 need to be passed through 
> a Reader on input and a Writer on output for any sort of robustness 
> to apply.  Which one I choose to use is almost never based on Java's 
> internal storage format for Strings.

Similarly, speaking as a C++ programmer, I do not find UTF-16 to be less C++
friendly than UTF-8.

However I agree with Tim's argument that allowing a choice of UTF-8 or UTF-16 to
be made by an author or producing application (and hence mandating that the two
be differentiated and handled by the consuming application) is a good practice
and should be allowed by the charmod rules.

-- 
Jon Hanna
<http://www.hackcraft.net/>
"…it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt

Received on Monday, 8 March 2004 07:04:39 UTC