hixie: Move the Content-Type encoding parsing hack of an algorithm back into HTML5 from MIMESNIFF. (whatwg r5042)

hixie: Move the Content-Type encoding parsing hack of an algorithm back
into HTML5 from MIMESNIFF. (whatwg r5042)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.4056&r2=1.4057&f=h
http://html5.org/tools/web-apps-tracker?from=5041&to=5042

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.4056
retrieving revision 1.4057
diff -u -d -r1.4056 -r1.4057
--- Overview.html 13 Apr 2010 22:57:06 -0000 1.4056
+++ Overview.html 14 Apr 2010 03:06:58 -0000 1.4057
@@ -285,7 +285,7 @@
    <h1>HTML5</h1>
    <h2 class="no-num no-toc" id="a-vocabulary-and-associated-apis-for-html-and-xhtml">A vocabulary and associated APIs for HTML and XHTML</h2>
 
-   <h2 class="no-num no-toc" id="editor-s-draft-13-april-2010">Editor's Draft 13 April 2010</h2>
+   <h2 class="no-num no-toc" id="editor-s-draft-14-april-2010">Editor's Draft 14 April 2010</h2>
    <dl><dt>Latest Published Version:</dt>
     <dd><a href="http://www.w3.org/TR/html5/">http://www.w3.org/TR/html5/</a></dd>
     <dt>Latest Editor's Draft:</dt>
@@ -392,7 +392,7 @@
   specification's progress along the W3C Recommendation
   track.
 
-  This specification is the 13 April 2010 Editor's Draft.
+  This specification is the 14 April 2010 Editor's Draft.
   </p><!-- UNDER NO CIRCUMSTANCES IS THE PRECEDING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><!-- relationship to other work (required) --><p>The contents of this specification are also part of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/">a
   specification</a> published by the <a href="http://www.whatwg.org/">WHATWG</a>, which is available under a
   license that permits reuse of the specification text.</p><!-- UNDER NO CIRCUMSTANCES IS THE FOLLOWING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><!-- required patent boilerplate --><p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5
@@ -5545,12 +5545,6 @@
   with the requirements of the Content-Type Processing Model
   specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-  <p>The <dfn id="algorithm-for-extracting-an-encoding-from-a-content-type">algorithm for extracting an encoding from a
-  Content-Type</dfn>, given a string <var title="">s</var>, is given
-  in the Content-Type Processing Model specification. It either
-  returns an encoding or nothing. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
-  <p class="XXX">The above is out of date now that the relevant section has been removed from MIMESNIFF. Stay tuned; I'll bring it back here soon.</p>
-
   <p>The <dfn id="content-type-sniffing-0" title="Content-Type sniffing">sniffed type of a
   resource</dfn> must be found in a manner consistent with the
   requirements given in the Content-Type Processing Model
@@ -5571,6 +5565,50 @@
   occur. For more details, see the Content-Type Processing Model
   specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
+  <p>The <dfn id="algorithm-for-extracting-an-encoding-from-a-content-type">algorithm for extracting an encoding from a
+  Content-Type</dfn>, given a string <var title="">s</var>, is as
+  follows. It either returns an encoding or nothing.</p>
+
+  <ol><li><p>Find the first seven characters in <var title="">s</var>
+   that are an <a href="#ascii-case-insensitive">ASCII case-insensitive</a> match for the word
+   "<code title="">charset</code>".  If no such match is found, return
+   nothing.</li>
+
+   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
+   characters that immediately follow the word "<code title="">charset</code>" (there might not be any).</li>
+
+   <li><p>If the next character is not a U+003D EQUALS SIGN ('='),
+   return nothing and abort these steps.</li>
+
+   <li><p>Skip any U+0009, U+000A, U+000C, U+000D, or U+0020
+   characters that immediately follow the equals sign (there might not
+   be any).</li>
+
+   <li>
+
+    <p>Process the next character as follows:</p>
+
+    <dl class="switch"><dt>If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022 QUOTATION MARK ('"') in <var title="">s</var></dt>
+     <dt>If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 APOSTROPHE ("'") in <var title="">s</var></dt>
+     <dd>Return the encoding corresponding to the string between this character and the next earliest occurrence of this character.</dd>
+
+     <dt>If it is an unmatched U+0022 QUOTATION MARK ('"')</dt>
+     <dt>If it is an unmatched U+0027 APOSTROPHE ("'")</dt>
+     <dt>If there is no next character</dt>
+     <dd>Return nothing.</dd>
+
+     <dt>Otherwise</dt>
+     <dd>Return the encoding corresponding to the string from this
+     character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or
+     U+003B character or the end of <var title="">s</var>, whichever
+     comes first.</dd>
+
+    </dl></li>
+
+  </ol><p class="note">This requirement is a <a href="#willful-violation">willful violation</a>
+  of the HTTP specification, motivated by the need for backwards
+  compatibility with legacy content. <a href="#refsHTTP">[HTTP]</a></p>
+
   </div><h3 id="common-dom-interfaces"><span class="secno">2.7 </span>Common DOM interfaces</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><h4 id="reflecting-content-attributes-in-idl-attributes"><span class="secno">2.7.1 </span>Reflecting content attributes in IDL attributes</h4><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p>Some IDL attributes are defined to <dfn id="reflect">reflect</dfn> a
   particular content attribute. This means that on getting, the IDL
   attribute returns the current value of the content attribute, and on

Received on Wednesday, 14 April 2010 03:07:48 UTC