hixie: Mention and encourage UTF-8 detection specifically. (whatwg r3882)

hixie: Mention and encourage UTF-8 detection specifically. (whatwg
r3882)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3044&r2=1.3045&f=h
http://html5.org/tools/web-apps-tracker?from=3881&to=3882

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3044
retrieving revision 1.3045
diff -u -d -r1.3044 -r1.3045
--- Overview.html 17 Sep 2009 10:05:54 -0000 1.3044
+++ Overview.html 17 Sep 2009 22:36:13 -0000 1.3045
@@ -6636,7 +6636,7 @@
   for purposes other than their appropriate intended semantic
   purpose. Authors must not use elements, attributes, and attribute
   values that are not permitted by this specification or other
-  applicable specifications.<div class="example">
+  applicable specifications.</p><!-- http://www.w3.org/mid/17E341CD-E790-422C-9F9A-69347EE01CEB@iki.fi --><div class="example">
    <p>For example, the following document is non-conforming, despite
    being syntactically correct:</p>
 
@@ -55748,11 +55748,22 @@
    visited, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
    <i>tentative</i>, and abort these steps.</li>
 
-   <li><p>The user agent may attempt to autodetect the character
-   encoding from applying frequency analysis or other algorithms to
-   the data stream. If autodetection succeeds in determining a
-   character encoding, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
-   <i>tentative</i>, and abort these steps. <a href="#refsUNIVCHARDET">[UNIVCHARDET]</a></li>
+   <li>
+
+    <p>The user agent may attempt to autodetect the character encoding
+    from applying frequency analysis or other algorithms to the data
+    stream. If autodetection succeeds in determining a character
+    encoding, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
+    <i>tentative</i>, and abort these steps. <a href="#refsUNIVCHARDET">[UNIVCHARDET]</a></p>
+
+    <p class="note">The UTF-8 encoding has a highly detectable bit
+    pattern. Documents that contain bytes with values greater than
+    0x7F which match the UTF-8 pattern are very likely to be UTF-8,
+    while documents with byte sequences that do not match it are very
+    likely not. User-agents are therefore encouraged to search for
+    this common encoding.</p>
+
+   </li>
 
    <li><p>Otherwise, return an implementation-defined or
    user-specified default character encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
@@ -63961,7 +63972,6 @@
      <tr> <td> <code title="">rAtail;</code> </td> <td> U+0291C </td> </tr>
      <tr> <td> <code title="">rBarr;</code> </td> <td> U+0290F </td> </tr>
      <tr> <td> <code title="">rHar;</code> </td> <td> U+02964 </td> </tr>
-     <tr> <td> <code title="">race;</code> </td> <td> U+029DA </td> </tr>
      <tr> <td> <code title="">racute;</code> </td> <td> U+00155 </td> </tr>
      <tr> <td> <code title="">radic;</code> </td> <td> U+0221A </td> </tr>
      <tr> <td> <code title="">raemptyv;</code> </td> <td> U+029B3 </td> </tr>

Received on Thursday, 17 September 2009 22:37:23 UTC