- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Thu, 06 Oct 2011 23:33:25 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv20901 Modified Files: Overview.html Log Message: Define 'code unit'. (whatwg r6649) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.5330 retrieving revision 1.5331 diff -u -d -r1.5330 -r1.5331 --- Overview.html 6 Oct 2011 23:27:46 -0000 1.5330 +++ Overview.html 6 Oct 2011 23:33:20 -0000 1.5331 @@ -2712,18 +2712,21 @@ that are unrelated to their interpretation as ASCII. It excludes such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p><p>The term <dfn id="a-utf-16-encoding">a UTF-16 encoding</dfn> refers to any variant of UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without - a BOM, raw UTF-16LE, and raw UTF-16BE. <a href="#refsRFC2781">[RFC2781]</a><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that + a BOM, raw UTF-16LE, and raw UTF-16BE. <a href="#refsRFC2781">[RFC2781]</a><p>The term <dfn id="code-unit">code unit</dfn> is used as defined in the Web IDL + specification: a 16 bit unsigned integer, the smallest atomic + component of a <code>DOMString</code>. (This is a narrower + definition than the one used in Unicode.) <a href="#refsWEBIDL">[WEBIDL]</a><p>The term <dfn id="unicode-character">Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that is not a surrogate code point). <a href="#refsUNICODE">[UNICODE]</a><p>The term <dfn id="character">character</dfn>, when not qualified as <em>Unicode</em> character, means a <a href="#unicode-character">Unicode character</a> where possible, or a surrogate code point when not: when an algorithm that processes strings is defined in terms of characters, - a pair of <span title="code unit">code units</span> consisting of a + a pair of <a href="#code-unit" title="code unit">code units</a> consisting of a high surrogate followed by a low surrogate must be treated as a single character, but isolated surrogates must each be treated as a single character also.<p>The <dfn id="code-point-length">code-point length</dfn> of a string is the number of - <span title="code unit">code units</span> in that string. <a href="#refsWEBIDL">[WEBIDL]</a><p class="note">This complexity results from the historical decision - to define the DOM API in terms of 16 bit (UTF-16) <span title="code - unit">code units</span>, rather than in terms of <a href="#unicode-character" title="Unicode character">Unicode characters</a>.<h3 id="conformance-requirements"><span class="secno">2.2 </span>Conformance requirements</h3><p>All diagrams, examples, and notes in this specification are + <a href="#code-unit" title="code unit">code units</a> in that string.<p class="note">This complexity results from the historical decision + to define the DOM API in terms of 16 bit (UTF-16) <a href="#code-unit" title="code + unit">code units</a>, rather than in terms of <a href="#unicode-character" title="Unicode character">Unicode characters</a>.<h3 id="conformance-requirements"><span class="secno">2.2 </span>Conformance requirements</h3><p>All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be
Received on Thursday, 6 October 2011 23:33:30 UTC