- From: Brian Smith <brian@briansmith.org>
- Date: Mon, 28 Jan 2008 10:38:36 -0800
- To: <public-html-comments@w3.org>
Maybe we have a misunderstanding. Does the qualification "If the document does not start with a BOM, and if its encoding is not explicitly given by Content-Type metadata..." apply to the restriction against BOCU-1 and SCSU? If so, then I have no argument with that, except that the wording should be clearer. However, right now the wording reads as though an HTML 5 document must never be in one of these encodings, even if the encoding is denoted in the Content-Type header or the BOM. Henri Sivonen wrote: > The most common claims about the non-compactness of UTF-8 > turn out to be false when measured. I agree that for Wikipedia and news sites, there isn't an huge advantage in using SCSU or BOCU. But, web browsers, Wikipedia, and news sites are not the only applications of HTML. > > Right now, there are a lot of systems where it is cheaper/faster to > > implement SCSU-like encodings than it is to implement UTF-8+gzip, > > because gzip is expensive. J2ME is one example that is currently > > widely deployed. > > J2ME HTML5 UAs are most likely to use the Opera Mini > architecture in which case the origin server doesn't talk to > the J2ME thin client, so the point would be moot even if gzip > were prohibitively expensive on J2ME. Not all applications that use HTML are general purpose web browsers. If an HTML document or fragment is not going to be directly processed by a general-purpose web browser, then why do we need to restrict its encoding? When I was writing Thai language software on J2ME phones, GZIP compression was too expensive (in code size, memory, and time), and UTF-8 significantly expanded the size of the text. I used TIS-620 for the prototype so I could cache more data on the phone, with the intention of migrating to SCSU or BOCU later. Since the only encoding I can rely on with J2ME is UTF-8, I had to write my own encoders/decoders, but that was still easier than implementing gzip compression and decompression for severely memory-constrained devices. Once I had the encoder and decoder written, I decided to use it for everything, since it made everything smaller without requiring compression. Besides everything that I have said, I don't see how it is practical for HTML 5 to have a blacklist of encodings that should not be supported. Even after Unicode and the UTF encodings, new encodings are still being created. A list of that HTML processors are required to support make more sense. Then the restriction can be rewritten to "Don't use encodings that are not supported by your software." - Brian
Received on Monday, 28 January 2008 18:38:48 UTC