Re: BOCU-1, SCSU, etc.

Henri Sivonen wrote:

>> UTF-8 is significantly less compact than SCSU/BOCU for most 
>> peoples' native languages.
 
> For such arguments, gzip should always be considered and the  
> compatibility benefits of UTF-8 + gzip be appreciated.

+1  For details see http://unicode.org/notes/tn14/ 

Similar UTF-8 is less compact than "UTF-4" for all languages
roughly covered by Latin-1 (excl. C1 controls), and arguably
"UTF-4" could be considered as "better than windows-1252".

But that's beside the point for XHTML, where I can simply use
Latin-1 or windows-1252, and get any other code point as NCR,
many browsers support this.  Some very old browsers insist on
decimal NCRs, and how far their fonts support any other code
points is a different question, to some degree it works.

"UTF-4" would work nowhere today, and if it's ever published  
formally it would come with a MUST NOT for XML.

Which brings us back to the MUST NOT about BOCU-1 and SCSU:

The HTML5 spec. needs compelling reasons for MUST NOT, and
also for SHOULD NOT.  I understand SHOULD NOT as "you need a
very good excuse to ignore it", one standard good excuse is
"implemented before the SHOULD NOT".  If there are other good
excuses the spec. has to say what they could be, otherwise
folks could claim that "I want it" is a good enough excuse.

 Frank

Received on Sunday, 27 January 2008 14:10:32 UTC