- From: poot <cvsmail@w3.org>
- Date: Sat, 24 Oct 2009 07:13:16 +0900 (JST)
- To: public-html-diffs@w3.org
hixie: Reword the stuff about authors not using encodings to make more sense. (whatwg r4307) http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3442&r2=1.3443&f=h http://html5.org/tools/web-apps-tracker?from=4306&to=4307 =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.3442 retrieving revision 1.3443 diff -u -d -r1.3442 -r1.3443 --- Overview.html 23 Oct 2009 22:02:53 -0000 1.3442 +++ Overview.html 23 Oct 2009 22:12:59 -0000 1.3443 @@ -1728,12 +1728,11 @@ to support do things outside that range? -->, ignoring bytes that are the second and later bytes of multibyte sequences, all correspond to single-byte sequences that map to the same Unicode - characters as those bytes in ANSI_X3.4-1968 (US-ASCII). <a href="#refsRFC1345">[RFC1345]</a><p class="note">This includes such encodings as Shift_JIS and - variants of ISO-2022, even though it is possible in these encodings - for bytes like 0x70 to be part of longer sequences that are - unrelated to their interpretation as ASCII. It excludes such - encodings as UTF-7, UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC - variants.</p><!-- + characters as those bytes in ANSI_X3.4-1968 (US-ASCII). <a href="#refsRFC1345">[RFC1345]</a><p class="note">This includes such encodings as Shift_JIS, + HZ-GB-2312, and variants of ISO-2022, even though it is possible in + these encodings for bytes like 0x70 to be part of longer sequences + that are unrelated to their interpretation as ASCII. It excludes + such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><!-- We'll have to change that if anyone comes up with a way to have a document that is valid as two different encodings at once, with different <meta charset> elements applying in each case. @@ -10405,13 +10404,31 @@ <code><a href="#meta">meta</a></code> element with an <code title="attr-meta-http-equiv"><a href="#attr-meta-http-equiv">http-equiv</a></code> attribute in the <a href="#attr-meta-http-equiv-content-type" title="attr-meta-http-equiv-content-type">Encoding declaration state</a>, then the character encoding used must be an - <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a>.<p>Authors should not use JIS_C6226-1983<!-- aka JIS-X-0208, - x-JIS0208 -->, JIS_X0212-1990<!-- aka JIS-X-0212 -->, HZ-GB-2312<!-- - has crazy handling of ASCII "~" -->, encodings based on ISO-2022<!-- + <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a>.<p>Authors are encouraged to use UTF-8. Conformance checkers may + advise authors against using legacy encodings.<div class="impl"> + + <p>Authoring tools should default to using UTF-8 for newly-created + documents.</p> + + </div><p>Encodings in which a series of bytes in the range 0x20 to 0x7E + can encode characters other than the corresponding characters in the + range U+0020 to U+007E represent a potential security vulnerability: + a user agent that does not support the encoding (or does not support + the label used to declare the encoding, or does not use the same + mechanism to detect the encoding of unlabelled content as another + user agent) might end up interpreting technically benign plain text + content as HTML tags and JavaScript. In particular, this applies to + encodings in which the bytes corresponding to "<code title=""><script></code>" in ASCII can encode a different + string. Authors should not use such encodings, which are known to + include JIS_C6226-1983<!-- aka JIS-X-0208, x-JIS0208 -->, + JIS_X0212-1990<!-- aka JIS-X-0212 -->, HZ-GB-2312<!-- has crazy + handling of ASCII "~" -->, encodings based on ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 and http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-October/023797.html - -->, and encodings based on EBCDIC. Authors should not use UTF-32. - Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings. + -->, and encodings based on EBCDIC. Furtermore, authors must not use + the CESU-8, UTF-7, BOCU-1 and SCSU encodings, which also fall into + this category, because these encodings were never intended for use + for Web content. <a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types --> <a href="#refsRFC1842">[RFC1842]</a><!-- HZ-GB-2312 --> <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP --> @@ -10419,27 +10436,13 @@ <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 --> <a href="#refsRFC1922">[RFC1922]</a><!-- ISO-2022-CN and ISO-2022-CN-EXT --> <a href="#refsRFC1557">[RFC1557]</a><!-- ISO-2022-KR --> - <a href="#refsUNICODE">[UNICODE]</a> <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a> <!-- no idea what to reference for EBCDIC, so... --> - <p class="note">Most of these encodings are discouraged because of - security concerns. If a hostile user can contribute text to a site - using these encodings, bugs in the site's whitelisting filter or in - a user agent can easily lead to the filter interpreting the - contribution as "safe" while the user agent interprets the same - contribution as containing a <code><a href="#script">script</a></code> element. This would - enable cross-site scripting attacks. By avoiding these encodings, - and always providing a <a href="#character-encoding-declaration">character encoding declaration</a>, - an author is less likely to run into this kind of problem.<p>Authors are encouraged to use UTF-8. Conformance checkers may - advise authors against using legacy encodings.<div class="impl"> - - <p>Authoring tools should default to using UTF-8 for newly-created - documents.</p> - - </div><p class="note">Using non-UTF-8 encodings can have unexpected + <p>Authors should not use UTF-32, as the HTML5 encoding detection + algorithms intentionally do not distinguish it from UTF-16. <a href="#refsUNICODE">[UNICODE]</a><p class="note">Using non-UTF-8 encodings can have unexpected results on form submission and URL encodings, which use the <a href="#document-s-character-encoding">document's character encoding</a> by default.<p>In XHTML, the XML declaration should be used for inline character encoding information, if necessary.<div class="example">
Received on Friday, 23 October 2009 22:13:46 UTC