W3C home > Mailing lists > Public > public-html@w3.org > August 2008

Re: [whatwg] Is EBCDIC support needed for not breaking the Web?

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 29 Aug 2008 09:34:30 +0000 (UTC)
To: Henri Sivonen <hsivonen@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>, public-i18n-core@w3.org
Message-ID: <Pine.LNX.4.62.0808290930330.20254@hixie.dreamhostps.com>

On Sun, 1 Jun 2008, Henri Sivonen wrote:
> The HTML5 draft says that authors should not use EBCDIC-based encodings. 
> This is more lax than saying that authors must not use and user agents 
> must not support CESU-8, UTF-7, BOCU-1 and SCSU.
> In general, now that UTF-8 exists and is ubiquitously supported, 
> proliferation of encodings is costly and doesn't expand that the 
> expressiveness of HTML which is parsed into a Unicode DOM anyway. 
> Moreover, encodings that are not ASCII supersets are potential security 
> risks since the string "<script>" may be represented by different bytes 
> than in ASCII leading to potential privilege escalation if a server-side 
> gatekeeper and a user agent give different meanings to the bytes.
> For these reasons, if EBCDIC-based encodings don't need to be supported 
> in order to Support Existing Content, it would be beneficial never to 
> add support for them and, thus, ban them like CESU-8, UTF-7, BOCU-1 and 
> I asked Hixie for examples of sites or browsers that require/support 
> EBCDIC-based encodings. He had none. I examined the encoding menus of 
> Firefox 3b5, Safari 3.1 and Opera 9.5 beta (on Leopard) and IE8 beta 1 
> (on English XP SP3). None of them expose EBCDIC-based encodings in the 
> UI. (All the IBM encodings Firefox exposes turn out to be ASCII-based.)
> This makes me wonder: Do the top browsers support any EBCDIC-based 
> encodings but just without exposing them in the UI? If not, can there be 
> any notable EBCDIC-based Web content?
> I'm suspecting that EBCDIC isn't actually a Web-relevant.

I've made EBCDIC and UTF-32 have the same level of support -- "should not" 
on both authoring and implementation sides, no explicit support. (For 
example, the sniffing algorithms intentionally don't detect EBCDIC or 
UTF-32, despite doing so being relatively easy.)

(This thread message was originally cross-posted to the WHATWG list as 
well. I have trimmed the cc list to avoid excessive cross-posting.)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 29 August 2008 09:34:39 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:37 UTC