W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

RE: Encoding Standard at F2F

From: Ishii, Koji a | Koji | EBJB <koji.a.ishii@mail.rakuten.com>
Date: Sun, 4 Nov 2012 19:23:20 +0000
To: Jungshik SHIN (신정식) <jshin1987@gmail.com>, Anne van Kesteren <annevk@annevk.nl>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <42B5352A6034154CBE9379DF4ADF1A321095BD27@HKXPRD0310MB365.apcprd03.prod.outlook.com>
I think the spec should cover all relevant technologies around W3C, not only the web pages. I know little about how often ISO-2022-KR is used in other places than Web, but you should also pay attention to e-mail and other careers of W3C technologies.

Microsoft once disabled automatic detection of ISO-2022-JP in MS10-090<http://support.microsoft.com/kb/2416400/en-us> for the security concern but turned it on again in MS11-003<http://support.microsoft.com/kb/2482017/en-us> due to its bad impact. As you said and as Kuro confirmed, ISO-2022-JP is still an important encoding for the W3C to support.

Are you sure ISO-2022-KR and GB-HZ are not, considering all places W3C technologies are used including e-mail, TV, etc.?


Regards,
Koji

From: Jungshik SHIN (신정식) [mailto:jshin1987@gmail.com]
Sent: Saturday, November 03, 2012 12:20 PM
To: Anne van Kesteren
Cc: www-international@w3.org
Subject: Re: Encoding Standard at F2F


Hi,

Thank you for the note.

I wonder what consideration has been given to the inclusion of ISO-2022-KR and GB-HZ, two 7-bit encodings that are extremely rare on the web (if used at all) and are 'security risks' (in a sense) like other 7-bit encodings (e.g. UTF-7 that is not included).

We cannot drop ISO-2022-JP lightly because it's still used somewhere even though it's much less widely used than EUC-JP or Shift-JIS.

OTOH, ISO-2022-KR has never been meant for the web and it's safe to say that virtually no web page uses it. It's designed for emails (RFC 1557) in early 1990's and it got out of favor  even for emails in late 1990's because either EUC-KR (later UTF-8) with 8bit ESMTP or EUC-KR with base64/qp worked just fine. For web pages, there's absolutely no reason to use ISO-2022-KR from the beginning and it's not used.

For the last 20 years, I've seen web pages (other than test pages) in that encoding only once or twice. I'm a Korean speaker and I've visited numerous web pages.

To a slightly less extent, the same should hold for GB-HZ. It started its life to use in Usenet (and email), but using that on the web does not make much sense. I can't say about GB-HZ as strongly as about ISO-2022-KR, but my experience with Chrome development (below) is an indication that it's virtually unused.

Chrome didn't support either of them until about 2 years ago. They're added mainly because of http://encoding.spec.whatwg.org/  IIRC.  When neither is supported, I haven't had any complaint from Chrome users.

Jungshik


2012. 11. 3. 오전 7:31에 "Anne van Kesteren" <annevk@annevk.nl<mailto:annevk@annevk.nl>>님이 작성:
I joined the I18N WG for an hour or so at their F2F in TPAC to discuss
http://encoding.spec.whatwg.org/


We basically went through the document for a high-level overview of
what it attempts to do. We also concluded it is good enough to publish
as a FPWD, provided someone in the I18N WG has the time to do the
switch in style (from green to blue).

Based on feedback from Richard Ishida and Kawabata Taichi during that
meeting I filed these bugs:

* https://www.w3.org/Bugs/Public/show_bug.cgi?id=19816

* https://www.w3.org/Bugs/Public/show_bug.cgi?id=19817


If there was any other feedback during that session I failed to
capture I would appreciate if you could help me out. Issues with the
specification are best recorded in Bugzilla:
https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WHATWG&component=Encoding



--
http://annevankesteren.nl/

Received on Sunday, 4 November 2012 19:23:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 4 November 2012 19:23:56 GMT