- From: Mike Brown <mike@skew.org>
- Date: Fri, 22 Jun 2007 01:23:46 -0600 (MDT)
- To: Ian Hickson <ian@hixie.ch>
- CC: Mike Brown <mike@skew.org>, public-html@w3.org
Ian Hickson wrote: > On Mon, 18 Jun 2007, Mike Brown wrote: > > > > HTML 5 seems to now allow the entire U+0001..U+001F range, whereas HTML > > 4.x, 3.2, and I think 2.0, as defined by their "document character set" > > and SGML profile, have long forbidden all of that range except for tab, > > LF, CR, and, inexplicably, FF. > > > > Why is HTML 5 different, and what are the expectations for the > > processing of the now-allowed BEL, BS, VT, DEL, and so on? If it was > > deliberate, why not put a note of explanation in the spec? > > It was deliberate only insofar as I didn't come across any reason to > disallow them. The expectations for their processing are unaffected by > whether they are allowed or not. > > What would the note explain? > The note would explain why you feel it's important to include those codes in HTML 5, and the fact that there are no expectations of how they're interpreted; they're just no longer disallowed. Perhaps I'm just spoiled by the HTML 4 spec which mentions things like that. I'm guessing those control codes were previously disallowed out of a fear that there may have been some concern, at the time, for console-based browsers: you don't want such a browser to blindly pass control codes to the user's terminal. Arguably, that'd be the browser's mistake if it did, but why let the language permit it? It also perhaps makes more sense to just disallow such codes; they shouldn't be applicable in a modern document language that operates on a descriptive level of abstraction, rather than on a level that implies direct control of a terminal. I imagine it may also have been an effort to further deprecate the codes, to keep them from finding new life after all the technologies that most of them had been invented for were relegated to the recycling heap. It's my understanding that they were only included in the UCS, and that the UCS is organized the way it is, to placate people who were concerned over compatibility. Why prolong the life of these things that should die? There are some who feel that such deprecration of codes really makes their life difficult, though, so XML 1.1's compromise was to go ahead and allow them, but discourage their use. See the note in section 2.2 of XML 1.1. If you do allow all of U+0001..U+001F then you might as well allow U+0080..U+009F range as well, no? Do you have any plans to acknowledge the Windows-1252 confusion for NCRs in that range, such as € being treated as Euro by many (most?) browsers?
Received on Friday, 22 June 2007 07:24:24 UTC