- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Fri, 12 Sep 2008 23:25:49 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv32484 Modified Files: Overview.html Log Message: WF2: <form accept-charset> definition (but not the processing model yet). (whatwg r2172) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.1344 retrieving revision 1.1345 diff -u -d -r1.1344 -r1.1345 --- Overview.html 12 Sep 2008 10:08:00 -0000 1.1344 +++ Overview.html 12 Sep 2008 23:25:47 -0000 1.1345 @@ -300,6 +300,9 @@ <li><a href="#plugins"><span class=secno>2.1.4 </span>Plugins</a> + + <li><a href="#character"><span class=secno>2.1.5 </span>Character + encodings</a> </ul> <li><a href="#conformance"><span class=secno>2.2 </span>Conformance @@ -1896,7 +1899,7 @@ </span>Newlines</a> </ul> - <li><a href="#character"><span class=secno>8.1.4 </span>Character + <li><a href="#character0"><span class=secno>8.1.4 </span>Character references</a> <li><a href="#cdata"><span class=secno>8.1.5 </span>CDATA sections</a> @@ -1917,7 +1920,7 @@ <li><a href="#determining"><span class=secno>8.2.2.1. </span>Determining the character encoding</a> - <li><a href="#character0"><span class=secno>8.2.2.2. + <li><a href="#character1"><span class=secno>8.2.2.2. </span>Character encoding requirements</a> <li><a href="#preprocessing"><span class=secno>8.2.2.3. @@ -1951,7 +1954,7 @@ <li><a href="#data-state"><span class=secno>8.2.4.1. </span>Data state</a> - <li><a href="#character1"><span class=secno>8.2.4.2. + <li><a href="#character2"><span class=secno>8.2.4.2. </span>Character reference data state</a> <li><a href="#tag-open"><span class=secno>8.2.4.3. </span>Tag open @@ -1984,7 +1987,7 @@ <li><a href="#attribute2"><span class=secno>8.2.4.12. </span>Attribute value (unquoted) state</a> - <li><a href="#character2"><span class=secno>8.2.4.13. + <li><a href="#character3"><span class=secno>8.2.4.13. </span>Character reference in attribute value state</a> <li><a href="#after0"><span class=secno>8.2.4.14. </span>After @@ -2685,6 +2688,16 @@ agent itself, vulnerabilities in the third-party software become as dangerous as those in the user agent. + <h4 id=character><span class=secno>2.1.5 </span>Character encodings</h4> + + <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is + one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for + bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - + 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any + character sets we want to support do things outside that range? + -->. + <!-- XXX #refs RFC1345 ? --> + <h3 id=conformance><span class=secno>2.2 </span>Conformance requirements</h3> <p>All diagrams, examples, and notes in this specification are @@ -4871,7 +4884,7 @@ <li> <p>The <a href="#url">URL</a> is a valid IRI reference and the <a - href="#character3" title="document's character encoding">character + href="#character4" title="document's character encoding">character encoding</a> of the URL's <code>Document</code> is UTF-8 or UTF-16. <a href="#references">[RFC3987]</a> </ul> @@ -5086,7 +5099,7 @@ href="#urldoc">associated with</a> <var title="">url</var>. <li> - <p>Let <var title="">encoding</var> be the <a href="#character3" + <p>Let <var title="">encoding</var> be the <a href="#character4" title="document's character encoding">character encoding</a> of <var title="">document</var>. @@ -7342,9 +7355,9 @@ </ul> </div> - <p>Documents have an associated <dfn id=character3 title="document's + <p>Documents have an associated <dfn id=character4 title="document's character encoding">character encoding</dfn>. When a <code>Document</code> - object is created, the <a href="#character3">document's character + object is created, the <a href="#character4">document's character encoding</a> must be initialized to UTF-16. Various algorithms during page loading affect this value, as does the <code title=dom-document-charset><a href="#charset0">charset</a></code> setter. <a @@ -7354,15 +7367,15 @@ <p>The <dfn id=charset0 title=dom-document-charset><code>charset</code></dfn> DOM attribute must, on getting, return the preferred MIME name of the <a - href="#character3">document's character encoding</a>. On setting, if the + href="#character4">document's character encoding</a>. On setting, if the new value is an IANA-registered alias for a character encoding, the <a - href="#character3">document's character encoding</a> must be set to that + href="#character4">document's character encoding</a> must be set to that character encoding. (Otherwise, nothing happens.) <p>The <dfn id=characterset title=dom-document-characterSet><code>characterSet</code></dfn> DOM attribute must, on getting, return the preferred MIME name of the <a - href="#character3">document's character encoding</a>. + href="#character4">document's character encoding</a>. <p>The <dfn id=defaultcharset title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM @@ -8986,7 +8999,7 @@ <p>Remove all child nodes of the document. <li> - <p>Change the <a href="#character3">document's character encoding</a> to + <p>Change the <a href="#character4">document's character encoding</a> to UTF-16. <li> @@ -10157,7 +10170,7 @@ document-level metadata with the <code title=attr-meta-name><a href="#name">name</a></code> attribute, pragma directives with the <code title=attr-meta-http-equiv><a href="#http-equiv">http-equiv</a></code> - attribute, and the file's <a href="#character4">character encoding + attribute, and the file's <a href="#character5">character encoding declaration</a> when an HTML document is serialized to string form (e.g. for transmission over the network or for disk storage) with the <code title=attr-meta-charset><a href="#charset1">charset</a></code> attribute. @@ -10176,7 +10189,7 @@ <p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn> attribute specifies the character encoding used by the document. This is - called a <a href="#character4">character encoding declaration</a>. + called a <a href="#character5">character encoding declaration</a>. <p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code> attribute may be specified in <a href="#html5" title=HTML5>HTML @@ -10515,7 +10528,7 @@ user agent requirements are all handled by the parsing section of the specification. The state is just an alternative form of setting the <code title=meta-charset>charset</code> attribute: it is a <a - href="#character4">character encoding declaration</a>.</p> + href="#character5">character encoding declaration</a>.</p> <p>For <code><a href="#meta0">meta</a></code> elements in the <a href="#encoding" title=attr-meta-http-equiv-content-type>Encoding @@ -10724,7 +10737,7 @@ though if we do then we have to duplicate the requirements in the parsing section for conformance checkers --> - <p>A <dfn id=character4>character encoding declaration</dfn> is a mechanism + <p>A <dfn id=character5>character encoding declaration</dfn> is a mechanism by which the character encoding used to store or transmit a document is specified. @@ -10740,7 +10753,7 @@ http://www.iana.org/assignments/character-sets --> <li>The character encoding declaration must be serialized without the use - of <a href="#character5" title=syntax-charref>character references</a> or + of <a href="#character6" title=syntax-charref>character references</a> or character escapes of any kind. </ul> @@ -10764,14 +10777,6 @@ then the character encoding used must be an <a href="#ascii-compatible">ASCII-compatible character encoding</a>. - <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is - one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for - bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any - character sets we want to support do things outside that range? - -->. - <!-- XXX #refs RFC1345 ? --> - <p>Authors should not use JIS_X0212-1990, x-JIS0208, and encodings based on EBCDIC. Authors should not use UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a href="#references">[CESU8]</a> <a @@ -26576,7 +26581,8 @@ <dt>Element-specific attributes: - <dd><code title=attr-form-accept-charset>accept-charset</code> + <dd><code title=attr-form-accept-charset><a + href="#accept-charset">accept-charset</a></code> <dd><code title=attr-form-action>action</code> @@ -26593,7 +26599,7 @@ <dd> <pre class=idl>interface <dfn id=htmlformelement>HTMLFormElement</dfn> : <a href="#htmlelement">HTMLElement</a> { - attribute DOMString <span title=dom-form-accept-charset>accept-charset</span>; + attribute DOMString <a href="#accept-charset0" title=dom-form-accept-charset>accept-charset</a>; attribute DOMString <span title=dom-form-action>action</span>; attribute DOMString <span title=dom-form-enctype>enctype</span>; attribute DOMString <span title=dom-form-method>method</span>; @@ -26614,8 +26620,25 @@ };</pre> </dl> + <p>The <code><a href="#form">form</a></code> element represents a + collection of <a href="#field" title=category-field>data fields</a> that + can be submitted to a server for processing. + + <p>The <dfn id=accept-charset + title=attr-form-accept-charset><code>accept-charset</code></dfn> attribute + gives the character encodings that are to be used for the submission. If + specified, the value must be an <span>ordered set of space-separated + tokens</span>, and each token must be the preferred name of an <a + href="#ascii-compatible">ASCII-compatible character encoding</a>. <a + href="#references">[IANACHARSET]</a> + <p class=big-issue>... + <p>The <dfn id=accept-charset0 + title=dom-form-accept-charset><code>accept-charset</code></dfn> DOM + attribute must <a href="#reflect">reflect</a> the content attribute of the + same name. + <p>The <dfn id=elements3 title=dom-form-elements><code>elements</code></dfn> DOM attribute must return an <code><a @@ -28354,7 +28377,7 @@ <p>Otherwise, let <var><a href="#the-scripts0">the script's character encoding</a></var> for this <code><a href="#script1">script</a></code> - element be the same as <a href="#character3" title="document's character + element be the same as <a href="#character4" title="document's character encoding">the encoding of the document itself</a>.</p> <li> @@ -33510,7 +33533,7 @@ XXXDOCURL --> is <code><a href="#aboutblank">about:blank</a></code><!-- XXX xref -->, which is marked as being an <a href="#html-" title="HTML documents">HTML - document</a>, and whose <a href="#character3" title="document's character + document</a>, and whose <a href="#character4" title="document's character encoding">character encoding</a> is UTF-8. The <code>Document</code> must have a single child <code><a href="#html">html</a></code> node, which itself has a single child <code><a href="#body0">body</a></code> node. If @@ -38678,7 +38701,7 @@ or implied by the algorithms given in this specification, are the ones that must be used when determining the character encoding according to the rules given in the above specifications. Once the character encoding is - established, the <a href="#character3">document's character encoding</a> + established, the <a href="#character4">document's character encoding</a> must be set to that character encoding. <p>If the root element, as parsed according to the XML specifications cited @@ -38744,7 +38767,7 @@ versions thereof. <a href="#references">[RFC2046]</a> <a href="#references">[RFC2646]</a> - <p>The <a href="#character3">document's character encoding</a> must be set + <p>The <a href="#character4">document's character encoding</a> must be set to the character encoding used to decode the document. <p>Upon creation of the <code>Document</code> object, the user agent must @@ -47102,7 +47125,7 @@ described below. <p>RCDATA elements can have <a href="#text2" title=syntax-text>text</a> and - <a href="#character5" title=syntax-charref>character references</a>, but + <a href="#character6" title=syntax-charref>character references</a>, but the text must not contain an <a href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also <a href="#cdata-rcdata-restrictions">further restrictions</a> described @@ -47112,7 +47135,7 @@ any contents (since, again, as there's no end tag, no content can be put between the start tag and the end tag). Foreign elements whose start tag is <em>not</em> marked as self-closing can have <a href="#text2" - title=syntax-text>text</a>, <a href="#character5" + title=syntax-text>text</a>, <a href="#character6" title=syntax-charref>character references</a>, <a href="#cdata1" title=syntax-cdata>CDATA sections</a>, other <a href="#elements5" title=syntax-elements>elements</a>, and <a href="#comments0" @@ -47122,7 +47145,7 @@ ampersand</a>. <p>Normal elements can have <a href="#text2" title=syntax-text>text</a>, <a - href="#character5" title=syntax-charref>character references</a>, other <a + href="#character6" title=syntax-charref>character references</a>, other <a href="#elements5" title=syntax-elements>elements</a>, and <a href="#comments0" title=syntax-comments>comments</a>, but the text must not contain the character U+003C LESS-THAN SIGN (<code><</code>) or an @@ -47218,7 +47241,7 @@ <p><dfn id=attribute4 title=syntax-attribute-value>Attribute values</dfn> are a mixture of <a href="#text2" title=syntax-text>text</a> and <a - href="#character5" title=syntax-charref>character references</a>, except + href="#character6" title=syntax-charref>character references</a>, except with the additional restriction that the text cannot contain an <a href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. @@ -47609,7 +47632,7 @@ that is not itself in an <a href="#escaping" title=syntax-escape>escaping text span</a>, and ends at the next <a href="#escaping1" title=syntax-escape-end>escaping text span end</a>. There cannot be any <a - href="#character5" title=syntax-charref>character references</a> inside an + href="#character6" title=syntax-charref>character references</a> inside an <a href="#escaping" title=syntax-escape>escaping text span</a>. <p>An <dfn id=escaping0 title=syntax-escape-start>escaping text span @@ -47651,10 +47674,10 @@ FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order. - <h4 id=character><span class=secno>8.1.4 </span>Character references</h4> + <h4 id=character0><span class=secno>8.1.4 </span>Character references</h4> <p>In certain cases described in other sections, <a href="#text2" - title=syntax-text>text</a> may be mixed with <dfn id=character5 + title=syntax-text>text</a> may be mixed with <dfn id=character6 title=syntax-charref>character references</dfn>. These can be used to escape characters that couldn't otherwise legally be included in <a href="#text2" title=syntax-text>text</a>. @@ -48265,12 +48288,12 @@ heuristically decide which to use as a default. </ol> - <p>The <a href="#character3">document's character encoding</a> must + <p>The <a href="#character4">document's character encoding</a> must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to use for the input stream. - <h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding + <h5 id=character1><span class=secno>8.2.2.2. </span>Character encoding requirements</h5> <p>User agents must at a minimum support the UTF-8 and Windows-1252 @@ -48282,7 +48305,11 @@ <p>User agents must support the preferred MIME name of every character encoding they support that has a preferred MIME name, and should support all the IANA-registered aliases. <a - href="#references">[IANACHARSET]</a> + href="#references">[IANACHARSET]</a></p> + <!-- XXX should all this be abstracted out so it can be used for + <script charset=""> and <form accept-charset="">? Maybe move this + stuff and the 'character encodings' section of the terminology + section into its own infrastructure subsection? --> <p>When comparing a string specifying a character encoding with the name or alias of a character encoding to determine if they are equal, user agents @@ -48533,7 +48560,7 @@ have the same Unicode interpretations in both the current encoding and the new encoding, and if the user agent supports changing the converter on the fly, then the user agent may change to the new converter for the - encoding on the fly. Set the <a href="#character3">document's character + encoding on the fly. Set the <a href="#character4">document's character encoding</a> and the encoding used to convert the input stream to the new encoding, set the <a href="#confidence" title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and @@ -49140,7 +49167,7 @@ <dd>When the <a href="#content4">content model flag</a> is set to one of the PCDATA or RCDATA states and the <a href="#escape">escape flag</a> is - false: switch to the <a href="#character6">character reference data + false: switch to the <a href="#character7">character reference data state</a>. <dd>Otherwise: treat it as per the "anything else" entry below. @@ -49197,8 +49224,8 @@ href="#data-state0">data state</a>. </dl> - <h5 id=character1><span class=secno>8.2.4.2. </span><dfn - id=character6>Character reference data state</dfn></h5> + <h5 id=character2><span class=secno>8.2.4.2. </span><dfn + id=character7>Character reference data state</dfn></h5> <p><em>(This cannot happen if the <a href="#content4">content model flag</a> is set to the CDATA state.)</em> @@ -49631,7 +49658,7 @@ <dt>U+0026 AMPERSAND (&) - <dd>Switch to the <a href="#character7">character reference in attribute + <dd>Switch to the <a href="#character8">character reference in attribute value state</a>, with the <a href="#additional">additional allowed character</a> being U+0022 QUOTATION MARK ("). @@ -49660,7 +49687,7 @@ <dt>U+0026 AMPERSAND (&) - <dd>Switch to the <a href="#character7">character reference in attribute + <dd>Switch to the <a href="#character8">character reference in attribute value state</a>, with the <a href="#additional">additional allowed character</a> being U+0027 APOSTROPHE ('). @@ -49695,7 +49722,7 @@ <dt>U+0026 AMPERSAND (&) - <dd>Switch to the <a href="#character7">character reference in attribute + <dd>Switch to the <a href="#character8">character reference in attribute value state</a>, with no <a href="#additional">additional allowed character</a>. @@ -49724,8 +49751,8 @@ Stay in the <a href="#attribute8">attribute value (unquoted) state</a>. </dl> - <h5 id=character2><span class=secno>8.2.4.13. </span><dfn - id=character7>Character reference in attribute value state</dfn></h5> + <h5 id=character3><span class=secno>8.2.4.13. </span><dfn + id=character8>Character reference in attribute value state</dfn></h5> <p>Attempt to <a href="#consume">consume a character reference</a>. @@ -50470,8 +50497,8 @@ <p>This section defines how to <dfn id=consume>consume a character reference</dfn>. This definition is used when parsing character references - <a href="#character6" title="character reference data state">in text</a> - and <a href="#character7" title="character reference in attribute value + <a href="#character7" title="character reference data state">in text</a> + and <a href="#character8" title="character reference in attribute value state">in attributes</a>. <p>The behavior depends on the identity of the next character (the one @@ -50828,7 +50855,7 @@ <p>If the last character matched is not a U+003B SEMICOLON (<code title="">;</code>), there is a <a href="#parse2">parse error</a>.</p> - <p>If the character reference is being consumed <a href="#character7" + <p>If the character reference is being consumed <a href="#character8" title="character reference in attribute value state">as part of an attribute</a>, and the last character matched is not a U+003B SEMICOLON (<code title="">;</code>), and the next character is in the range U+0030
Received on Friday, 12 September 2008 23:26:25 UTC