[Bug 11540] The willful violation clause is most unwise. A standard should not violate another standard for any reason. This wouls lead to 2 things : 1) Content correctly encoded content would never be displayed correctly. 2) All future standards would need to includ

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11540

--- Comment #9 from Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com> 2010-12-14 02:46:42 UTC ---
(In reply to comment #8)
> I'm not sure it is appropriate for any of us to tell each other we're off-topic
> or not.

"Only one issue—please use separate bugs for separate issues."

http://dev.w3.org/html5/decision-policy/decision-policy.html

> If Laura is concerned about the phrase "willful violation", then
> hearing more details about what drives the use of this phrase in this
> bug could then lead her to decide against posting another bug, or to
> post a bug that is more likely to generate a useful response.

Optimising some theoretical other bug is a poor rationale for swamping
discussion in _this_ bug.

> The original bug is fairly generic. The example seems to be more of a
> an example of one specific mention of the "willful violation".

The rationale might be potentially applicable to other willful
violations, but the report applied it to /a/ clause (singular), not
multiple clauses. It doesn't say anything about it being a mere example.

> So, erring on the side of question, the bug could be broken into two parts:
> 
> Is the use of willful violation justified? 

The bug report posits /a priori/ that willful violations can never be
justified. I think that's an indefensible position since, while it's
reasonable to expect groups working on different standards to try to
work together:

    1. It's ultimately unrealistic to expect a group in charge of
formulating Standard X to be able to force a group in charge of
formulating Standard Y to reformulate Standard Y as required for the
target audience of Standard X.

    2. It's inhuman to expect the group in charge of formulating
Standard X to sacrifice the human needs of its target audience (e.g.
access to access information and services over the world wide web,
protection of their privacy and security) on the altar of technical
consistency with Standard Y.

To put it another way: free agents are free agents. :)

Do you have any arguments or information to add on this?

> Is this specific use of willful justification justified?

This is always a good question to ask. :)

I claim no expertise in the subject of character encodings, so take the
following with a pinch of salt.

HTML5 character mappings need to enable access to the deployed web
corpus interoperably with major user agents.

Not least of the advantages of standardizing such mappings is to help
protect users from security problems like:

http://shiflett.org/blog/2005/dec/google-xss-example

http://code.google.com/p/chromium/issues/detail?id=15701

For general background see:

Web encodings page on the WHATWG wiki:

http://wiki.whatwg.org/wiki/Web_Encodings

"Internal character encoding declaration" thread at WHATWG

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-March/006000.html

"Superset encodings" thread at WHATWG

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html

"charset name matching rules" thread at W3C:

http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html

Some test cases:

http://hsivonen.iki.fi/test/wa10/encoding-detection/

http://www.hixie.ch/tests/adhoc/html/parsing/encoding/all.html

http://coq.no/character-tables/en

I've taken the trouble to search the archives for rationales specific
to each violation. I make no guarantee that this information is complete
or accurate; read the links and make up your own minds.

"Popular browsers" here is shorthand for the big four engines (Trident,
Gecko, WebKit, Presto).

   * Popular browsers and Google Web Search map EUC-KR to Windows-949.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
     http://code.google.com/p/chromium/issues/detail?id=15701
    
http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp

   * Popular browsers map EUC-JP to CP51932.
     http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444
    
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-September/023208.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html

   * Popular browsers (but not Google Web Search) map GB2312 and GB_2312-80 to
     the superset GBK.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-March/014219.html
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/020846.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html 

   * Popular browsers and Google Web Search map ISO-8859-1 to the superset
     windows-1252.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-March/006000.html
    
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-November/007737.html
    
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-November/007882.html
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011650.html
     http://wiki.whatwg.org/wiki/Web_Encodings
     http://mail.apps.ietf.org/ietf/charsets/msg01835.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html

   * WebKit and Google Web Search map ISO-8859-9 to the superset
     windows-1254. Adopting this behavior has support from an Opera rep.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011648.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Aug/0047.html
     http://wiki.whatwg.org/wiki/Web_Encodings
     http://lists.w3.org/Archives/Public/public-html-comments/2009Aug/0041.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
    
http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp

   * Popular browsers and Google Web Search map ISO-8859-11 to the
     superset windows-874.
     http://lists.w3.org/Archives/Public/public-html/2008Mar/0183.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
    
http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp

   * Popular browsers and Google Web Search map TIS-620 to the
     superset windows-874.
     http://lists.w3.org/Archives/Public/public-html/2008Mar/0183.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     http://wiki.whatwg.org/wiki/Web_Encodings
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
    
http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp

   * Popular browsers and Google Web Search map KS_C_5601-1987 to windows-949.
     http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0030.html
     http://lists.w3.org/Archives/Public/www-archive/2008Jun/0155.html
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-July/021207.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html

   * Popular browsers and Google Web Search map Shift_JIS to its superset
     Windows-31J.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019322.html
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     Note ongoing discussion at the IETF:
     http://mail.apps.ietf.org/ietf/charsets/msg01942.html

   * Popular browsers and Google Web Search map TIS-620 to its superset
     windows-874. WebKit S60 made this change back in 2006 because of a
     bug report.
     http://trac.webkit.org/changeset/15974
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2007-June/011651.html
     http://lists.w3.org/Archives/Public/public-html-comments/2009Sep/0050.html
     http://wiki.whatwg.org/wiki/Web_Encodings
     http://mail.apps.ietf.org/ietf/charsets/msg01834.html
    
http://trac.webkit.org/browser/trunk/WebCore/platform/text/TextCodecICU.cpp
     http://www.opera.com/docs/specs/presto27/encodings/

   * Opera, Firefox, Safari, and Google Web Search map US-ASCII to its
     superset windows-1252, while IE7 drops the high bit. Ian judged the
     later behavior to be a security risk.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-July/015455.html
    
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-September/016170.html

   * Popular browsers map UTF-16 without BOM to LE. Content found in the wild
     depends on this behavior.
     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-June/020552.html

Fixing these willful violations by pushing them upstream into the IANA
registry is non-trivial.

Consider the problem of globally mapping Shift_JIS to windows-31J at the
registry level, as expressed by a Microsoft rep:

"Problem is that there are 4+ implementations of shift_jis in 'common'
use, and none of them are likely to change, since it'd break their
customers. :(

"So I don't see a perfect solution here.  HTML5 is fairly clear about
browser behavior, but in other environments, I think the best we can do
is point to the variants and allow the clients to decide which version
they'd like to use."

http://mail.apps.ietf.org/ietf/charsets/msg01966.html

Once the principle of munging encodings is accepted, there's clearly
room for updating the details based on new data. Do you have any
new data to add?

Can you persuade major user agent vendors to commit to a different
implementation strategy than the one described in the spec?

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Tuesday, 14 December 2010 02:46:46 UTC