- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 17 Aug 2011 22:28:40 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv747
Modified Files:
Overview.html
Log Message:
Clean up how we refer to UTF-16. (whatwg r6498)
Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.5198
retrieving revision 1.5199
diff -u -d -r1.5198 -r1.5199
--- Overview.html 17 Aug 2011 22:21:05 -0000 1.5198
+++ Overview.html 17 Aug 2011 22:28:35 -0000 1.5199
@@ -2702,7 +2702,9 @@
HZ-GB-2312, and variants of ISO-2022, even though it is possible in
these encodings for bytes like 0x70 to be part of longer sequences
that are unrelated to their interpretation as ASCII. It excludes
- such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
+ such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="a-utf-16-encoding">a UTF-16 encoding</dfn> refers to any variant of
+ UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
+ a BOM, raw UTF-16LE, and raw UTF-16BE. <a href="#refsRFC2781">[RFC2781]</a><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a href="#refsUNICODE">[UNICODE]</a><h3 id="conformance-requirements"><span class="secno">2.2 </span>Conformance requirements</h3><p>All diagrams, examples, and notes in this specification are
non-normative, as are all sections explicitly marked non-normative.
Everything else in this specification is normative.<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
@@ -5493,7 +5495,8 @@
component contains no unescaped non-ASCII characters. <a href="#refsRFC3987">[RFC3987]</a></li>
<li><p>The <a href="#url">URL</a> is a valid IRI reference and the <a href="#document-s-character-encoding" title="document's character encoding">character encoding</a> of
- the URL's <code><a href="#document">Document</a></code> is UTF-8 or UTF-16. <a href="#refsRFC3987">[RFC3987]</a></li>
+ the URL's <code><a href="#document">Document</a></code> is UTF-8 or <a href="#a-utf-16-encoding">a UTF-16
+ encoding</a>. <a href="#refsRFC3987">[RFC3987]</a></li>
</ul><p>A string is a <dfn id="valid-non-empty-url">valid non-empty URL</dfn> if it is a
<a href="#valid-url">valid URL</a> but it is not the empty string.<p>A string is a <dfn id="valid-url-potentially-surrounded-by-spaces">valid URL potentially surrounded by
@@ -5664,8 +5667,8 @@
</dl></li>
- <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then
- change the value of <var title="">encoding</var> to UTF-8.</li>
+ <li><p>If <var title="">encoding</var> is <a href="#a-utf-16-encoding">a UTF-16
+ encoding</a>, then change the value of <var title="">encoding</var> to UTF-8.</li>
<li>
@@ -56866,9 +56869,8 @@
<li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</li>
- <li><p>If <var title="">charset</var> is a UTF-16 encoding,
- change the value of <var title="">charset</var> to
- UTF-8.</li>
+ <li><p>If <var title="">charset</var> is <a href="#a-utf-16-encoding">a UTF-16
+ encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
@@ -57298,12 +57300,14 @@
violation</a> of the W3C Character Model specification, motivated
by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
- <p>When a user agent is to use the UTF-16 encoding but no BOM has
- been found, user agents must default to UTF-16LE.</p>
+ <p>When a user agent is to use the self-describing UTF-16 encoding
+ but no BOM has been found, user agents must default to little-endian
+ UTF-16.</p>
- <p class="note">The requirement to default UTF-16 to LE rather than
- BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a
- desire for compatibility with legacy content. <a href="#refsRFC2781">[RFC2781]</a></p>
+ <p class="note">The requirement to default UTF-16 to little-endian
+ rather than big-endian is a <a href="#willful-violation">willful violation</a> of RFC
+ 2781, motivated by a desire for compatibility with legacy content.
+ <a href="#refsRFC2781">[RFC2781]</a></p>
<hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p>
@@ -57415,13 +57419,13 @@
earlier section failed to find the right encoding.</li>
<li>If the encoding that is already being used to interpret the
- input stream is a UTF-16 encoding, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>
- <li>If the new encoding is a UTF-16 encoding, change it to
- UTF-8.</li>
+ <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
@@ -60765,7 +60769,7 @@
<p id="meta-charset-during-parse">If the element has a <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute, and its value
is either a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character
- encoding</a> or a UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
+ encoding</a> or <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
<i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the
encoding given by the value of the <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute.</p>
@@ -60775,8 +60779,8 @@
<code title="attr-meta-content"><a href="#attr-meta-content">content</a></code> attribute, and
applying the <a href="#algorithm-for-extracting-an-encoding-from-a-meta-element">algorithm for extracting an encoding from a
<code>meta</code> element</a> to that attribute's value returns
- a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or a
- UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
+ a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or
+ <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently
<i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the
extracted encoding.</p>
Received on Wednesday, 17 August 2011 22:28:42 UTC