RE: BOCU-1, SCSU, etc.

Henri Sivonen wrote:
> My understanding is that HTML 5 bans these post-UTF-8 
> second-system Unicode encodings no matter where you might 
> declare the use.

It is in section 3.7.5 (the META element), and not in section 8 (The
HTML Syntax), and the reference to section 3.7.5 in section 8 says that
the restrictions apply (only) in a (<META>) character encoding
declaration. So, it seems the real issue is just clarifying the text in
3.7.5.4 to indicate that those restrictions apply only when the META
charset override mechanism is being used.

> Brian Smith wrote:
> > Not all applications that use HTML are general purpose web 
> > browsers.  

> The purpose of the HTML 5 spec is to improve interoperability 
> between Web browsers as used with content and Web apps 
> published on the one public Web. The normative language in 
> the spec is concerned with publishing and consuming content 
> and apps on the Web. The purpose of the spec isn't to lower 
> the R&D cost of private and proprietary systems by producing 
> reusable bits.

Then why doesn't the specification list the encodings that conformant
web browsers are required to support, instead of listing the encodings
that document authors are forbidden from using.

> Clearly, such a device cannot host a useful HTML 5-enabled 
> Web browser (only a thing display client like Opera Mini), so 
> the point is moot as far as HTML 5 goes.

I don't think the charter intends the specification to only apply to web
browsers. In fact, many parts of the specification already acknowledge
that HTML 5 will be used in applications other than web browsers, and
provide some provisions for doing so.

> > Even after Unicode and the UTF encodings, new encodings are still 
> > being created.
> 
> Deploying such encodings on the public network is a 
> colossally bad idea. (My own nation has engaged in this folly 
> with ISO-8859-15, so I've seen the bad consequences at home, too.)

That is exactly my point. If the intention is that BOCU-1 should be
prohibited, then shouldn't ISO-8859-15 be prohibited for the same
reason? Why one and not the other?

Anyway, I am pretty sure that the restriction against BOCU and similar
encodings is just to make it possible to correctly parse the <META>
charset override, not to prevent their use altogether. The language just
needs to be made clearer. 

- Brian

Received on Tuesday, 29 January 2008 16:28:53 UTC