- From: poot <cvsmail@w3.org>
- Date: Wed, 17 Aug 2011 18:28:56 -0400
- To: public-html-diffs@w3.org
hixie: Clean up how we refer to UTF-16. (whatwg r6498) http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.5198&r2=1.5199&f=h http://html5.org/tools/web-apps-tracker?from=6497&to=6498 =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.5198 retrieving revision 1.5199 diff -u -d -r1.5198 -r1.5199 --- Overview.html 17 Aug 2011 22:21:05 -0000 1.5198 +++ Overview.html 17 Aug 2011 22:28:35 -0000 1.5199 @@ -2702,7 +2702,9 @@ HZ-GB-2312, and variants of ISO-2022, even though it is possible in these encodings for bytes like 0x70 to be part of longer sequences that are unrelated to their interpretation as ASCII. It excludes - such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that + such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="a-utf-16-encoding">a UTF-16 encoding</dfn> refers to any variant of + UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without + a BOM, raw UTF-16LE, and raw UTF-16BE. <a href="#refsRFC2781">[RFC2781]</a><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that is not a surrogate code point). <a href="#refsUNICODE">[UNICODE]</a><h3 id="conformance-requirements"><span class="secno">2.2 </span>Conformance requirements</h3><p>All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and @@ -5493,7 +5495,8 @@ component contains no unescaped non-ASCII characters. <a href="#refsRFC3987">[RFC3987]</a></li> <li><p>The <a href="#url">URL</a> is a valid IRI reference and the <a href="#document-s-character-encoding" title="document's character encoding">character encoding</a> of - the URL's <code><a href="#document">Document</a></code> is UTF-8 or UTF-16. <a href="#refsRFC3987">[RFC3987]</a></li> + the URL's <code><a href="#document">Document</a></code> is UTF-8 or <a href="#a-utf-16-encoding">a UTF-16 + encoding</a>. <a href="#refsRFC3987">[RFC3987]</a></li> </ul><p>A string is a <dfn id="valid-non-empty-url">valid non-empty URL</dfn> if it is a <a href="#valid-url">valid URL</a> but it is not the empty string.<p>A string is a <dfn id="valid-url-potentially-surrounded-by-spaces">valid URL potentially surrounded by @@ -5664,8 +5667,8 @@ </dl></li> - <li><p>If <var title="">encoding</var> is a UTF-16 encoding, then - change the value of <var title="">encoding</var> to UTF-8.</li> + <li><p>If <var title="">encoding</var> is <a href="#a-utf-16-encoding">a UTF-16 + encoding</a>, then change the value of <var title="">encoding</var> to UTF-8.</li> <li> @@ -56866,9 +56869,8 @@ <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is false, then jump to the second step of the overall "two step" algorithm.</li> - <li><p>If <var title="">charset</var> is a UTF-16 encoding, - change the value of <var title="">charset</var> to - UTF-8.</li> + <li><p>If <var title="">charset</var> is <a href="#a-utf-16-encoding">a UTF-16 + encoding</a>, change the value of <var title="">charset</var> to UTF-8.</li> <li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the second step of the @@ -57298,12 +57300,14 @@ violation</a> of the W3C Character Model specification, motivated by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p> - <p>When a user agent is to use the UTF-16 encoding but no BOM has - been found, user agents must default to UTF-16LE.</p> + <p>When a user agent is to use the self-describing UTF-16 encoding + but no BOM has been found, user agents must default to little-endian + UTF-16.</p> - <p class="note">The requirement to default UTF-16 to LE rather than - BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a - desire for compatibility with legacy content. <a href="#refsRFC2781">[RFC2781]</a></p> + <p class="note">The requirement to default UTF-16 to little-endian + rather than big-endian is a <a href="#willful-violation">willful violation</a> of RFC + 2781, motivated by a desire for compatibility with legacy content. + <a href="#refsRFC2781">[RFC2781]</a></p> <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p> @@ -57415,13 +57419,13 @@ earlier section failed to find the right encoding.</li> <li>If the encoding that is already being used to interpret the - input stream is a UTF-16 encoding, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to <i>certain</i> and abort these steps. The new encoding is ignored; if it was anything but the same encoding, then it would be clearly incorrect.</li> - <li>If the new encoding is a UTF-16 encoding, change it to - UTF-8.</li> + <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change + it to UTF-8.</li> <li>If all the bytes up to the last byte converted by the current decoder have the same Unicode interpretations in both the current @@ -60765,7 +60769,7 @@ <p id="meta-charset-during-parse">If the element has a <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute, and its value is either a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character - encoding</a> or a UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently + encoding</a> or <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently <i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the encoding given by the value of the <code title="attr-meta-charset"><a href="#attr-meta-charset">charset</a></code> attribute.</p> @@ -60775,8 +60779,8 @@ <code title="attr-meta-content"><a href="#attr-meta-content">content</a></code> attribute, and applying the <a href="#algorithm-for-extracting-an-encoding-from-a-meta-element">algorithm for extracting an encoding from a <code>meta</code> element</a> to that attribute's value returns - a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or a - UTF-16 encoding, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently + a supported <a href="#ascii-compatible-character-encoding">ASCII-compatible character encoding</a> or + <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is currently <i>tentative</i>, then <a href="#change-the-encoding">change the encoding</a> to the extracted encoding.</p>
Received on Wednesday, 17 August 2011 22:28:58 UTC