W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2008

Is EBCDIC support needed for not breaking the Web?

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 1 Jun 2008 16:45:26 +0300
Message-Id: <64E4A9A2-86F2-4BD9-B2FA-E409DC7F183B@iki.fi>
To: "public-html@w3.org WG" <public-html@w3.org>, whatwg List <whatwg@whatwg.org>, public-i18n-core@w3.org

The HTML5 draft says that authors should not use EBCDIC-based  
encodings. This is more lax than saying that authors must not use and  
user agents must not support CESU-8, UTF-7, BOCU-1 and SCSU.

In general, now that UTF-8 exists and is ubiquitously supported,  
proliferation of encodings is costly and doesn't expand that the  
expressiveness of HTML which is parsed into a Unicode DOM anyway.  
Moreover, encodings that are not ASCII supersets are potential  
security risks since the string "<script>" may be represented by  
different bytes than in ASCII leading to potential privilege  
escalation if a server-side gatekeeper and a user agent give different  
meanings to the bytes.

For these reasons, if EBCDIC-based encodings don't need to be  
supported in order to Support Existing Content, it would be beneficial  
never to add support for them and, thus, ban them like CESU-8, UTF-7,  
BOCU-1 and SCSU.

I asked Hixie for examples of sites or browsers that require/support  
EBCDIC-based encodings. He had none. I examined the encoding menus of  
Firefox 3b5, Safari 3.1 and Opera 9.5 beta (on Leopard) and IE8 beta 1  
(on English XP SP3). None of them expose EBCDIC-based encodings in the  
UI. (All the IBM encodings Firefox exposes turn out to be ASCII-based.)

This makes me wonder: Do the top browsers support any EBCDIC-based  
encodings but just without exposing them in the UI? If not, can there  
be any notable EBCDIC-based Web content?

I'm suspecting that EBCDIC isn't actually a Web-relevant.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Sunday, 1 June 2008 13:46:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:55 GMT