html5/spec Overview.html,1.3422,1.3423

Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv7773

Modified Files:
	Overview.html 
Log Message:
discourage use of HZ-GB-2312; explain why. (whatwg r4282)

Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3422
retrieving revision 1.3423
diff -u -d -r1.3422 -r1.3423
--- Overview.html	23 Oct 2009 02:21:19 -0000	1.3422
+++ Overview.html	23 Oct 2009 03:00:29 -0000	1.3423
@@ -10425,12 +10425,13 @@
   <a href="#attr-meta-http-equiv-content-type" title="attr-meta-http-equiv-content-type">Encoding declaration
   state</a>, then the character encoding used must be an
   <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a>.<p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
-  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
-  ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-  -->, and encodings based on EBCDIC. Authors should not use
-  UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings.
+  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
+  crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
+  http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
+  encodings based on EBCDIC. Authors should not use UTF-32.
+  Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
   <a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types -->
+  <a href="#refsRFC1842">[RFC1842]</a><!-- HZ-GB-2312 -->
   <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
   <a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 -->
   <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 -->
@@ -10442,8 +10443,16 @@
   <a href="#refsBOCU1">[BOCU1]</a>
   <a href="#refsSCSU">[SCSU]</a>
   <!-- no idea what to reference for EBCDIC, so... -->
-  <p>Authors are encouraged to use UTF-8. Conformance checkers may
-  advise against authors using legacy encodings.<div class="impl">
+  <p class="note">Most of these encodings are discouraged because of
+  security concerns. If a hostile user can contribute text to a site
+  using these encodings, bugs in the site's whitelisting filter or in
+  a user agent can easily lead to the filter interpreting the
+  contribution as "safe" while the user agent interprets the same
+  contribution as containing a <code><a href="#script">script</a></code> element. This would
+  enable cross-site scripting attacks. By avoiding these encodings,
+  and always providing a <a href="#character-encoding-declaration">character encoding declaration</a>,
+  an author is less likely to run into this kind of problem.<p>Authors are encouraged to use UTF-8. Conformance checkers may
+  advise authors against using legacy encodings.<div class="impl">
 
   <p>Authoring tools should default to using UTF-8 for newly-created
   documents.</p>
@@ -71071,6 +71080,13 @@
    Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
    December 1993.</dd>
 
+   <dt id="refsRFC1842">[RFC1842]</dt>
+
+   <dd><cite><a href="http://www.ietf.org/rfc/rfc1842.txt">ASCII
+   Printable Characters-Based Chinese Character Encoding for Internet
+   Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
+   IETF, August 1995.</dd>
+
    <dt id="refsRFC1922">[RFC1922]</dt>
    <dd><cite><a href="http://www.ietf.org/rfc/rfc1922.txt">Chinese Character
    Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,

Received on Friday, 23 October 2009 03:00:35 UTC