- From: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Date: Sat, 26 Jan 2008 22:14:19 +0100
- To: "Henri Sivonen" <hsivonen@iki.fi>
- Cc: <public-html-comments@w3.org>
Henri Sivonen wrote: [disclaimer] > The chairs are Chris Wilson and Dan Connolly. Dan Connolly > specifically instructed people who reply to emails on > public-html-comments to include a disclaimer. Okay, W3C magic, I'm no going to check what it is good for. In the IETF they sometimes "hum" in a desperate attempt to avoid anything that could be misinterpreted as "voting". > The main concern of the spec is what kind of encoding > support in browsers in necessary and good for the Web. Necessary is MUST, good is SHOULD. We are mainly talking about SHOULD NOT at the opposite end of the spectrum. >> I considered it as a (mildly) pointless proposal [...] > I'd omit "(mildly)". Forcing "windows-1252" for something else is only a crude workaround, the effects for "show source" (or for showing errors) must be odd. Validators and (X)HTML aside, I'd expect that browsers can display text/plain "850" (858). <a type="text/plain" charset="PC-Multilingual-850+euro" etc. is about as much as I can do in this case (as user, as server admin I would do more), with an "850" icon for human visitors explaining what "850" is. IOW I have a bunch of "850" text/plain files, and I have links to them in XHTML files. > UTF-8 can express all Unicode characters Nobody wants or understands *all* Unicode characters, it's a cute reference charset. When we are down to send short messages from a mobile device with 1 MB RAM at a cost of 10 cents per 50 octets UTF-8 is not necessarily the first choice for a hypothetical poor Shavian community. That is admittedly beside the point for HTML5, but your statement was apparently general, not limited to HTML5. > Communications on the public Web affect other people, > so developers who implement pointless stuff waste the > time of other developers as well when they need to > interoperate with the pointlessness. What's a waste of time for you might be a feature for others, and vice versa. E.g. I considered all tags in HTML 4 as pointless, harmful, and waste of time, which did not work with older browsers (two extreme examples, thead + ins were okay, tfoot + del were ugly). > And UCS2 was never supposed to turn into UTF-16. ;-) Okay, some folks don't believe in "ought to be enough for everybody". ("UTF-4" could do 15*4=60 bits, don't try that with UTF-8, 16, 32 :-) > In some cases UTF-32 might be preferable in RAM. UTF-32 > is never preferable as an encoding for transferring > over the network. I would not dare say never (for this point). > HTML5 encoded as UTF-8 is *always* more compact than > the same document encoded as UTF-16 or UTF-32 regardless > of the script of the content. In a mathematical sense we could force "more compact" as near to zero as we want if we agree on some use for the code points where UTF-8 needs four octects. And as human user used to hex. (but not modulo 64) I'd prefer the pure UTF-32 for code points that anyway make no sense for me. It's easy to determine the number of UTF-32 code points in a given string, "compact" is not always everything. JFTR, we completely agree that it is usually a bad idea, a SHOULD NOT for XHTML producers as in HTML5 3.7.5.4 is okay, although I fail to see why that's limited to HTML5, this could be a more general advice also for XHTML 1 etc. Digression: I don't believe for a second that HTML5 can "dictate" what other XHTML versions or XML do. If 3.7.5.4 is a "SHOULD NOT generate", then 8.2.2.2 is appararently a "SHOULD NOT accept", and that's IMO wrong. > the spec requirements about UTF-32 took their current > form in response to a real developer request: > http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-May/011310.html Confusing an UTF-32 with an UTF-16 BOM is certainly bad. But I'm no fan of a BOM outside of text/plain, UTF-32BE and UTF-32LE have no BOM. If the HTML5 WG is unable to fix the broken table in 4.9.1 (3) it should be disbanded, in the spirit of "don't waste the time of developers". Frank
Received on Saturday, 26 January 2008 21:14:15 UTC