- From: <bugzilla@jessica.w3.org>
- Date: Tue, 14 Dec 2010 02:46:43 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11540 --- Comment #9 from Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com> 2010-12-14 02:46:42 UTC --- (In reply to comment #8) > I'm not sure it is appropriate for any of us to tell each other we're off-topic > or not. "Only one issue�please use separate bugs for separate issues." http://dev.w3.org/html5/decision-policy/decision-policy.html > If Laura is concerned about the phrase "willful violation", then > hearing more details about what drives the use of this phrase in this > bug could then lead her to decide against posting another bug, or to > post a bug that is more likely to generate a useful response. Optimising some theoretical other bug is a poor rationale for swamping discussion in _this_ bug. > The original bug is fairly generic. The example seems to be more of a > an example of one specific mention of the "willful violation". The rationale might be potentially applicable to other willful violations, but the report applied it to /a/ clause (singular), not multiple clauses. It doesn't say anything about it being a mere example. > So, erring on the side of question, the bug could be broken into two parts: > > Is the use of willful violation justified? The bug report posits /a priori/ that willful violations can never be justified. I think that's an indefensible position since, while it's reasonable to expect groups working on different standards to try to work together: 1. It's ultimately unrealistic to expect a group in charge of formulating Standard X to be able to force a group in charge of formulating Standard Y to reformulate Standard Y as required for the target audience of Standard X. 2. It's inhuman to expect the group in charge of formulating Standard X to sacrifice the human needs of its target audience (e.g. access to access information and services over the world wide web, protection of their privacy and security) on the altar of technical consistency with Standard Y. To put it another way: free agents are free agents. :) Do you have any arguments or information to add on this? > Is this specific use of willful justification justified? This is always a good question to ask. :) I claim no expertise in the subject of character encodings, so take the following with a pinch of salt. HTML5 character mappings need to enable access to the deployed web corpus interoperably with major user agents. Not least of the advantages of standardizing such mappings is to help protect users from security problems like: http://shiflett.org/blog/2005/dec/google-xss-example http://code.google.com/p/chromium/issues/detail?id=15701 For general background see: Web encodings page on the WHATWG wiki: http://wiki.whatwg.org/wiki/Web_Encodings "Internal character encoding declaration" thread at WHATWG http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-March/006000.html "Superset encodings" thread at WHATWG http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html "charset name matching rules" thread at W3C: http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html Some test cases: http://hsivonen.iki.fi/test/wa10/encoding-detection/ http://www.hixie.ch/tests/adhoc/html/parsing/encoding/all.html http://coq.no/character-tables/en I've taken the trouble to search the archives for rationales specific to each violation. I make no guarantee that this information is complete or accurate; read the links and make up your own minds. "Popular browsers" here is shorthand for the big four engines (Trident, Gecko, WebKit, Presto). * Popular browsers and Google Web Search map EUC-KR to Windows-949. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://code.google.com/p/chromium/issues/detail?id=15701 http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp * Popular browsers map EUC-JP to CP51932. http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444 http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-September/023208.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html * Popular browsers (but not Google Web Search) map GB2312 and GB_2312-80 to the superset GBK. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-March/014219.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/020846.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html * Popular browsers and Google Web Search map ISO-8859-1 to the superset windows-1252. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-March/006000.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-November/007737.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-November/007882.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011650.html http://wiki.whatwg.org/wiki/Web_Encodings http://mail.apps.ietf.org/ietf/charsets/msg01835.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html * WebKit and Google Web Search map ISO-8859-9 to the superset windows-1254. Adopting this behavior has support from an Opera rep. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011648.html http://lists.w3.org/Archives/Public/public-html-comments/2009Aug/0047.html http://wiki.whatwg.org/wiki/Web_Encodings http://lists.w3.org/Archives/Public/public-html-comments/2009Aug/0041.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp * Popular browsers and Google Web Search map ISO-8859-11 to the superset windows-874. http://lists.w3.org/Archives/Public/public-html/2008Mar/0183.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp * Popular browsers and Google Web Search map TIS-620 to the superset windows-874. http://lists.w3.org/Archives/Public/public-html/2008Mar/0183.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html http://wiki.whatwg.org/wiki/Web_Encodings http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp * Popular browsers and Google Web Search map KS_C_5601-1987 to windows-949. http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0030.html http://lists.w3.org/Archives/Public/www-archive/2008Jun/0155.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/021207.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html * Popular browsers and Google Web Search map Shift_JIS to its superset Windows-31J. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html Note ongoing discussion at the IETF: http://mail.apps.ietf.org/ietf/charsets/msg01942.html * Popular browsers and Google Web Search map TIS-620 to its superset windows-874. WebKit S60 made this change back in 2006 because of a bug report. http://trac.webkit.org/changeset/15974 http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011651.html http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html http://wiki.whatwg.org/wiki/Web_Encodings http://mail.apps.ietf.org/ietf/charsets/msg01834.html http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp http://www.opera.com/docs/specs/presto27/encodings/ * Opera, Firefox, Safari, and Google Web Search map US-ASCII to its superset windows-1252, while IE7 drops the high bit. Ian judged the later behavior to be a security risk. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-July/015455.html http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-September/016170.html * Popular browsers map UTF-16 without BOM to LE. Content found in the wild depends on this behavior. http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020552.html Fixing these willful violations by pushing them upstream into the IANA registry is non-trivial. Consider the problem of globally mapping Shift_JIS to windows-31J at the registry level, as expressed by a Microsoft rep: "Problem is that there are 4+ implementations of shift_jis in 'common' use, and none of them are likely to change, since it'd break their customers. :( "So I don't see a perfect solution here. HTML5 is fairly clear about browser behavior, but in other environments, I think the best we can do is point to the variants and allow the clients to decide which version they'd like to use." http://mail.apps.ietf.org/ietf/charsets/msg01966.html Once the principle of munging encodings is accepted, there's clearly room for updating the details based on new data. Do you have any new data to add? Can you persuade major user agent vendors to commit to a different implementation strategy than the one described in the spec? -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Tuesday, 14 December 2010 02:46:46 UTC