hixie: Clean up how we refer to UTF-16. (whatwg r6498)

hixie: Clean up how we refer to UTF-16. (whatwg r6498)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.5198&r2=1.5199&f=h
http://html5.org/tools/web-apps-tracker?from=6497&to=6498

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.5198
retrieving revision 1.5199
diff -u -d -r1.5198 -r1.5199
--- Overview.html 17 Aug 2011 22:21:05 -0000 1.5198
+++ Overview.html 17 Aug 2011 22:28:35 -0000 1.5199
@@ -2702,7 +2702,9 @@
   HZ-GB-2312, and variants of ISO-2022, even though it is possible in
   these encodings for bytes like 0x70 to be part of longer sequences
   that are unrelated to their interpretation as ASCII. It excludes
-  such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
+  such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="a-utf-16-encoding">a UTF-16 encoding</dfn> refers to any variant of
+  UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
+  a BOM, raw UTF-16LE, and raw UTF-16BE. <a href="#refsRFC2781">[RFC2781]</a><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
   is not a surrogate code point). <a href="#refsUNICODE">[UNICODE]</a><h3 id="conformance-requirements"><span class="secno">2.2 </span>Conformance requirements</h3><p>All diagrams, examples, and notes in this specification are
   non-normative, as are all sections explicitly marked non-normative.
   Everything else in this specification is normative.<p>The key words "MUST", "MUST NOT", "REQUIRED",  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
@@ -5493,7 +5495,8 @@
    component contains no unescaped non-ASCII characters. <a href="#refsRFC3987">[RFC3987]</a></li>
 
    <li><p>The <a href="#url">URL</a> is a valid IRI reference and the <a href="#document-s-character-encoding" title="document's character encoding">character encoding</a> of
-   the URL's <code><a href="#document">Document</a></code> is UTF-8 or UTF-16. <a href="#refsRFC3987">[RFC3987]</a></li>
+   the URL's <code><a href="#document">Document</a></code> is UTF-8 or <a href="#a-utf-16-encoding">a UTF-16
+   encoding</a>. <a href="#refsRFC3987">[RFC3987]</a></li>
 
   </ul><p>A string is a <dfn id="valid-non-empty-url">valid non-empty URL</dfn> if it is a
   <a href="#valid-url">valid URL</a> but it is not the empty string.<p>A string is a <dfn id="valid-url-potentially-surrounded-by-spaces">valid URL potentially surrounded by
@@ -5664,8 +5667,8 @@
 
     </dl></li>
 
-   <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then
-   change the value of <var title="">encoding</var> to UTF-8.</li>
+   <li><p>If <var title="">encoding</var> is <a href="#a-utf-16-encoding">a UTF-16
+   encoding</a>, then change the value of <var title="">encoding</var> to UTF-8.</li>
 
    <li>
 
@@ -56866,9 +56869,8 @@
          <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
          step of the overall "two step" algorithm.</li>
 
-         <li><p>If <var title="">charset</var> is a UTF-16 encoding,
-         change the value of <var title="">charset</var> to
-         UTF-8.</li>
+         <li><p>If <var title="">charset</var> is <a href="#a-utf-16-encoding">a UTF-16
+         encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
 
          <li><p>If <var title="">charset</var> is not a supported
          character encoding, then jump to the second step of the
@@ -57298,12 +57300,14 @@
   violation</a> of the W3C Character Model specification, motivated
   by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
 
-  <p>When a user agent is to use the UTF-16 encoding but no BOM has
-  been found, user agents must default to UTF-16LE.</p>
+  <p>When a user agent is to use the self-describing UTF-16 encoding
+  but no BOM has been found, user agents must default to little-endian
+  UTF-16.</p>
 
-  <p class="note">The requirement to default UTF-16 to LE rather than
-  BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a
-  desire for compatibility with legacy content. <a href="#refsRFC2781">[RFC2781]</a></p>
+  <p class="note">The requirement to default UTF-16 to little-endian
+  rather than big-endian is a <a href="#willful-violation">willful violation</a> of RFC
+  2781, motivated by a desire for compatibility with legacy content.
+  <a href="#refsRFC2781">[RFC2781]</a></p>
 
   <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
   encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p>
@@ -57415,13 +57419,13 @@
    earlier section failed to find the right encoding.</li>
 
    <li>If the encoding that is already being used to interpret the
-   input stream is a UTF-16 encoding, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+   input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
    <i>certain</i> and abort these steps. The new encoding is ignored;
    if it was anything but the same encoding, then it would be clearly
    incorrect.</li>
 
-   <li>If the new encoding is a UTF-16 encoding, change it to
-   UTF-8.</li>
+   <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change
+   it to UTF-8.</li>
 
    <li>If all the bytes up to the last byte converted by the current
    decoder have the same Unicode interpretations in both the current
@@ -60765,7 +60769,7 @@
 
     <p id="meta-charset-during-parse">If the element has a <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute, and its value
     is either a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character
-    encoding</a> or a UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
+    encoding</a> or <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
     <i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the
     encoding given by the value of the <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute.</p>
 
@@ -60775,8 +60779,8 @@
     <code title="attr-meta-content"><a href="#attr-meta-content">content</a></code> attribute, and
     applying the <a href="#algorithm-for-extracting-an-encoding-from-a-meta-element">algorithm for extracting an encoding from a
     <code>meta</code> element</a> to that attribute's value returns
-    a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or a
-    UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
+    a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or
+    <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
     <i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the
     extracted encoding.</p>

Received on Wednesday, 17 August 2011 22:28:58 UTC