html5/spec Overview.html,1.1344,1.1345 from Ian Hickson via cvs-syncmail on 2008-09-12 (public-html-commits@w3.org from September 2008)

From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
Date: Fri, 12 Sep 2008 23:25:49 +0000
To: public-html-commits@w3.org
Message-Id: <E1KeI1N-0008SE-TC@lionel-hutz.w3.org>
Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv32484

Modified Files:
	Overview.html 
Log Message:
WF2: <form accept-charset> definition (but not the processing model yet). (whatwg r2172)

Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.1344
retrieving revision 1.1345
diff -u -d -r1.1344 -r1.1345
--- Overview.html	12 Sep 2008 10:08:00 -0000	1.1344
+++ Overview.html	12 Sep 2008 23:25:47 -0000	1.1345
@@ -300,6 +300,9 @@
         
 
        <li><a href="#plugins"><span class=secno>2.1.4 </span>Plugins</a>
+
+       <li><a href="#character"><span class=secno>2.1.5 </span>Character
+        encodings</a>
       </ul>
 
      <li><a href="#conformance"><span class=secno>2.2 </span>Conformance
@@ -1896,7 +1899,7 @@
           </span>Newlines</a>
         </ul>
 
-       <li><a href="#character"><span class=secno>8.1.4 </span>Character
+       <li><a href="#character0"><span class=secno>8.1.4 </span>Character
         references</a>
 
        <li><a href="#cdata"><span class=secno>8.1.5 </span>CDATA sections</a>
@@ -1917,7 +1920,7 @@
          <li><a href="#determining"><span class=secno>8.2.2.1.
           </span>Determining the character encoding</a>
 
-         <li><a href="#character0"><span class=secno>8.2.2.2.
+         <li><a href="#character1"><span class=secno>8.2.2.2.
           </span>Character encoding requirements</a>
 
          <li><a href="#preprocessing"><span class=secno>8.2.2.3.
@@ -1951,7 +1954,7 @@
          <li><a href="#data-state"><span class=secno>8.2.4.1. </span>Data
           state</a>
 
-         <li><a href="#character1"><span class=secno>8.2.4.2.
+         <li><a href="#character2"><span class=secno>8.2.4.2.
           </span>Character reference data state</a>
 
          <li><a href="#tag-open"><span class=secno>8.2.4.3. </span>Tag open
@@ -1984,7 +1987,7 @@
          <li><a href="#attribute2"><span class=secno>8.2.4.12.
           </span>Attribute value (unquoted) state</a>
 
-         <li><a href="#character2"><span class=secno>8.2.4.13.
+         <li><a href="#character3"><span class=secno>8.2.4.13.
           </span>Character reference in attribute value state</a>
 
          <li><a href="#after0"><span class=secno>8.2.4.14. </span>After
@@ -2685,6 +2688,16 @@
    agent itself, vulnerabilities in the third-party software become as
    dangerous as those in the user agent.
 
+  <h4 id=character><span class=secno>2.1.5 </span>Character encodings</h4>
+
+  <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is
+   one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for
+   bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
+   0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
+  character sets we want to support do things outside that range?
+  -->.
+   <!-- XXX #refs RFC1345 ? -->
+
   <h3 id=conformance><span class=secno>2.2 </span>Conformance requirements</h3>
 
   <p>All diagrams, examples, and notes in this specification are
@@ -4871,7 +4884,7 @@
 
    <li>
     <p>The <a href="#url">URL</a> is a valid IRI reference and the <a
-     href="#character3" title="document's character encoding">character
+     href="#character4" title="document's character encoding">character
      encoding</a> of the URL's <code>Document</code> is UTF-8 or UTF-16. <a
      href="#references">[RFC3987]</a>
   </ul>
@@ -5086,7 +5099,7 @@
      href="#urldoc">associated with</a> <var title="">url</var>.
 
    <li>
-    <p>Let <var title="">encoding</var> be the <a href="#character3"
+    <p>Let <var title="">encoding</var> be the <a href="#character4"
      title="document's character encoding">character encoding</a> of <var
      title="">document</var>.
 
@@ -7342,9 +7355,9 @@
    </ul>
   </div>
 
-  <p>Documents have an associated <dfn id=character3 title="document's
+  <p>Documents have an associated <dfn id=character4 title="document's
    character encoding">character encoding</dfn>. When a <code>Document</code>
-   object is created, the <a href="#character3">document's character
+   object is created, the <a href="#character4">document's character
    encoding</a> must be initialized to UTF-16. Various algorithms during page
    loading affect this value, as does the <code title=dom-document-charset><a
    href="#charset0">charset</a></code> setter. <a
@@ -7354,15 +7367,15 @@
   <p>The <dfn id=charset0
    title=dom-document-charset><code>charset</code></dfn> DOM attribute must,
    on getting, return the preferred MIME name of the <a
-   href="#character3">document's character encoding</a>. On setting, if the
+   href="#character4">document's character encoding</a>. On setting, if the
    new value is an IANA-registered alias for a character encoding, the <a
-   href="#character3">document's character encoding</a> must be set to that
+   href="#character4">document's character encoding</a> must be set to that
    character encoding. (Otherwise, nothing happens.)
 
   <p>The <dfn id=characterset
    title=dom-document-characterSet><code>characterSet</code></dfn> DOM
    attribute must, on getting, return the preferred MIME name of the <a
-   href="#character3">document's character encoding</a>.
+   href="#character4">document's character encoding</a>.
 
   <p>The <dfn id=defaultcharset
    title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM
@@ -8986,7 +8999,7 @@
     <p>Remove all child nodes of the document.
 
    <li>
-    <p>Change the <a href="#character3">document's character encoding</a> to
+    <p>Change the <a href="#character4">document's character encoding</a> to
      UTF-16.
 
    <li>
@@ -10157,7 +10170,7 @@
    document-level metadata with the <code title=attr-meta-name><a
    href="#name">name</a></code> attribute, pragma directives with the <code
    title=attr-meta-http-equiv><a href="#http-equiv">http-equiv</a></code>
-   attribute, and the file's <a href="#character4">character encoding
+   attribute, and the file's <a href="#character5">character encoding
    declaration</a> when an HTML document is serialized to string form (e.g.
    for transmission over the network or for disk storage) with the <code
    title=attr-meta-charset><a href="#charset1">charset</a></code> attribute.
@@ -10176,7 +10189,7 @@
 
   <p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn>
    attribute specifies the character encoding used by the document. This is
-   called a <a href="#character4">character encoding declaration</a>.
+   called a <a href="#character5">character encoding declaration</a>.
 
   <p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code>
    attribute may be specified in <a href="#html5" title=HTML5>HTML
@@ -10515,7 +10528,7 @@
      user agent requirements are all handled by the parsing section of the
      specification. The state is just an alternative form of setting the
      <code title=meta-charset>charset</code> attribute: it is a <a
-     href="#character4">character encoding declaration</a>.</p>
+     href="#character5">character encoding declaration</a>.</p>
 
     <p>For <code><a href="#meta0">meta</a></code> elements in the <a
      href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
@@ -10724,7 +10737,7 @@
   though if we do then we have to duplicate the requirements in the
   parsing section for conformance checkers -->
 
-  <p>A <dfn id=character4>character encoding declaration</dfn> is a mechanism
+  <p>A <dfn id=character5>character encoding declaration</dfn> is a mechanism
    by which the character encoding used to store or transmit a document is
    specified.
 
@@ -10740,7 +10753,7 @@
    http://www.iana.org/assignments/character-sets -->
 
    <li>The character encoding declaration must be serialized without the use
-    of <a href="#character5" title=syntax-charref>character references</a> or
+    of <a href="#character6" title=syntax-charref>character references</a> or
     character escapes of any kind.
   </ul>
 
@@ -10764,14 +10777,6 @@
    then the character encoding used must be an <a
    href="#ascii-compatible">ASCII-compatible character encoding</a>.
 
-  <p>An <dfn id=ascii-compatible>ASCII-compatible character encoding</dfn> is
-   one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for
-   bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
-   0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any
-  character sets we want to support do things outside that range?
-  -->.
-   <!-- XXX #refs RFC1345 ? -->
-
   <p>Authors should not use JIS_X0212-1990, x-JIS0208, and encodings based on
    EBCDIC. Authors should not use UTF-32. Authors must not use the CESU-8,
    UTF-7, BOCU-1 and SCSU encodings. <a href="#references">[CESU8]</a> <a
@@ -26576,7 +26581,8 @@
 
    <dt>Element-specific attributes:
 
-   <dd><code title=attr-form-accept-charset>accept-charset</code>
+   <dd><code title=attr-form-accept-charset><a
+    href="#accept-charset">accept-charset</a></code>
 
    <dd><code title=attr-form-action>action</code>
 
@@ -26593,7 +26599,7 @@
    <dd>
     <pre
      class=idl>interface <dfn id=htmlformelement>HTMLFormElement</dfn> : <a href="#htmlelement">HTMLElement</a> {
-           attribute DOMString <span title=dom-form-accept-charset>accept-charset</span>;
+           attribute DOMString <a href="#accept-charset0" title=dom-form-accept-charset>accept-charset</a>;
            attribute DOMString <span title=dom-form-action>action</span>;
            attribute DOMString <span title=dom-form-enctype>enctype</span>;
            attribute DOMString <span title=dom-form-method>method</span>;
@@ -26614,8 +26620,25 @@
 };</pre>
   </dl>
 
+  <p>The <code><a href="#form">form</a></code> element represents a
+   collection of <a href="#field" title=category-field>data fields</a> that
+   can be submitted to a server for processing.
+
+  <p>The <dfn id=accept-charset
+   title=attr-form-accept-charset><code>accept-charset</code></dfn> attribute
+   gives the character encodings that are to be used for the submission. If
+   specified, the value must be an <span>ordered set of space-separated
+   tokens</span>, and each token must be the preferred name of an <a
+   href="#ascii-compatible">ASCII-compatible character encoding</a>. <a
+   href="#references">[IANACHARSET]</a>
+
   <p class=big-issue>...
 
+  <p>The <dfn id=accept-charset0
+   title=dom-form-accept-charset><code>accept-charset</code></dfn> DOM
+   attribute must <a href="#reflect">reflect</a> the content attribute of the
+   same name.
+
   <p>The <dfn id=elements3
    title=dom-form-elements><code>elements</code></dfn> DOM attribute must
    return an <code><a
@@ -28354,7 +28377,7 @@
 
     <p>Otherwise, let <var><a href="#the-scripts0">the script's character
      encoding</a></var> for this <code><a href="#script1">script</a></code>
-     element be the same as <a href="#character3" title="document's character
+     element be the same as <a href="#character4" title="document's character
      encoding">the encoding of the document itself</a>.</p>
 
    <li>
@@ -33510,7 +33533,7 @@
   XXXDOCURL -->
    is <code><a href="#aboutblank">about:blank</a></code><!-- XXX xref -->,
    which is marked as being an <a href="#html-" title="HTML documents">HTML
-   document</a>, and whose <a href="#character3" title="document's character
+   document</a>, and whose <a href="#character4" title="document's character
    encoding">character encoding</a> is UTF-8. The <code>Document</code> must
    have a single child <code><a href="#html">html</a></code> node, which
    itself has a single child <code><a href="#body0">body</a></code> node. If
@@ -38678,7 +38701,7 @@
    or implied by the algorithms given in this specification, are the ones
    that must be used when determining the character encoding according to the
    rules given in the above specifications. Once the character encoding is
-   established, the <a href="#character3">document's character encoding</a>
+   established, the <a href="#character4">document's character encoding</a>
    must be set to that character encoding.
 
   <p>If the root element, as parsed according to the XML specifications cited
@@ -38744,7 +38767,7 @@
    versions thereof. <a href="#references">[RFC2046]</a> <a
    href="#references">[RFC2646]</a>
 
-  <p>The <a href="#character3">document's character encoding</a> must be set
+  <p>The <a href="#character4">document's character encoding</a> must be set
    to the character encoding used to decode the document.
 
   <p>Upon creation of the <code>Document</code> object, the user agent must
@@ -47102,7 +47125,7 @@
    described below.
 
   <p>RCDATA elements can have <a href="#text2" title=syntax-text>text</a> and
-   <a href="#character5" title=syntax-charref>character references</a>, but
+   <a href="#character6" title=syntax-charref>character references</a>, but
    the text must not contain an <a href="#ambiguous"
    title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
    <a href="#cdata-rcdata-restrictions">further restrictions</a> described
@@ -47112,7 +47135,7 @@
    any contents (since, again, as there's no end tag, no content can be put
    between the start tag and the end tag). Foreign elements whose start tag
    is <em>not</em> marked as self-closing can have <a href="#text2"
-   title=syntax-text>text</a>, <a href="#character5"
+   title=syntax-text>text</a>, <a href="#character6"
    title=syntax-charref>character references</a>, <a href="#cdata1"
    title=syntax-cdata>CDATA sections</a>, other <a href="#elements5"
    title=syntax-elements>elements</a>, and <a href="#comments0"
@@ -47122,7 +47145,7 @@
    ampersand</a>.
 
   <p>Normal elements can have <a href="#text2" title=syntax-text>text</a>, <a
-   href="#character5" title=syntax-charref>character references</a>, other <a
+   href="#character6" title=syntax-charref>character references</a>, other <a
    href="#elements5" title=syntax-elements>elements</a>, and <a
    href="#comments0" title=syntax-comments>comments</a>, but the text must
    not contain the character U+003C LESS-THAN SIGN (<code>&lt;</code>) or an
@@ -47218,7 +47241,7 @@
 
   <p><dfn id=attribute4 title=syntax-attribute-value>Attribute values</dfn>
    are a mixture of <a href="#text2" title=syntax-text>text</a> and <a
-   href="#character5" title=syntax-charref>character references</a>, except
+   href="#character6" title=syntax-charref>character references</a>, except
    with the additional restriction that the text cannot contain an <a
    href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
    ampersand</a>.
@@ -47609,7 +47632,7 @@
    that is not itself in an <a href="#escaping" title=syntax-escape>escaping
    text span</a>, and ends at the next <a href="#escaping1"
    title=syntax-escape-end>escaping text span end</a>. There cannot be any <a
-   href="#character5" title=syntax-charref>character references</a> inside an
+   href="#character6" title=syntax-charref>character references</a> inside an
    <a href="#escaping" title=syntax-escape>escaping text span</a>.
 
   <p>An <dfn id=escaping0 title=syntax-escape-start>escaping text span
@@ -47651,10 +47674,10 @@
    FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE
    FEED (LF) characters in that order.
 
-  <h4 id=character><span class=secno>8.1.4 </span>Character references</h4>
+  <h4 id=character0><span class=secno>8.1.4 </span>Character references</h4>
 
   <p>In certain cases described in other sections, <a href="#text2"
-   title=syntax-text>text</a> may be mixed with <dfn id=character5
+   title=syntax-text>text</a> may be mixed with <dfn id=character6
    title=syntax-charref>character references</dfn>. These can be used to
    escape characters that couldn't otherwise legally be included in <a
    href="#text2" title=syntax-text>text</a>.
@@ -48265,12 +48288,12 @@
      heuristically decide which to use as a default.
   </ol>
 
-  <p>The <a href="#character3">document's character encoding</a> must
+  <p>The <a href="#character4">document's character encoding</a> must
    immediately be set to the value returned from this algorithm, at the same
    time as the user agent uses the returned value to select the decoder to
    use for the input stream.
 
-  <h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
+  <h5 id=character1><span class=secno>8.2.2.2. </span>Character encoding
    requirements</h5>
 
   <p>User agents must at a minimum support the UTF-8 and Windows-1252
@@ -48282,7 +48305,11 @@
   <p>User agents must support the preferred MIME name of every character
    encoding they support that has a preferred MIME name, and should support
    all the IANA-registered aliases. <a
-   href="#references">[IANACHARSET]</a>
+   href="#references">[IANACHARSET]</a></p>
+  <!-- XXX should all this be abstracted out so it can be used for
+  <script charset=""> and <form accept-charset="">? Maybe move this
+  stuff and the 'character encodings' section of the terminology
+  section into its own infrastructure subsection? -->
 
   <p>When comparing a string specifying a character encoding with the name or
    alias of a character encoding to determine if they are equal, user agents
@@ -48533,7 +48560,7 @@
     have the same Unicode interpretations in both the current encoding and
     the new encoding, and if the user agent supports changing the converter
     on the fly, then the user agent may change to the new converter for the
-    encoding on the fly. Set the <a href="#character3">document's character
+    encoding on the fly. Set the <a href="#character4">document's character
     encoding</a> and the encoding used to convert the input stream to the new
     encoding, set the <a href="#confidence"
     title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and
@@ -49140,7 +49167,7 @@
 
    <dd>When the <a href="#content4">content model flag</a> is set to one of
     the PCDATA or RCDATA states and the <a href="#escape">escape flag</a> is
-    false: switch to the <a href="#character6">character reference data
+    false: switch to the <a href="#character7">character reference data
     state</a>.
 
    <dd>Otherwise: treat it as per the "anything else" entry below.
@@ -49197,8 +49224,8 @@
     href="#data-state0">data state</a>.
   </dl>
 
-  <h5 id=character1><span class=secno>8.2.4.2. </span><dfn
-   id=character6>Character reference data state</dfn></h5>
+  <h5 id=character2><span class=secno>8.2.4.2. </span><dfn
+   id=character7>Character reference data state</dfn></h5>
 
   <p><em>(This cannot happen if the <a href="#content4">content model
    flag</a> is set to the CDATA state.)</em>
@@ -49631,7 +49658,7 @@
 
    <dt>U+0026 AMPERSAND (&amp;)
 
-   <dd>Switch to the <a href="#character7">character reference in attribute
+   <dd>Switch to the <a href="#character8">character reference in attribute
     value state</a>, with the <a href="#additional">additional allowed
     character</a> being U+0022 QUOTATION MARK (&quot;).
 
@@ -49660,7 +49687,7 @@
 
    <dt>U+0026 AMPERSAND (&amp;)
 
-   <dd>Switch to the <a href="#character7">character reference in attribute
+   <dd>Switch to the <a href="#character8">character reference in attribute
     value state</a>, with the <a href="#additional">additional allowed
     character</a> being U+0027 APOSTROPHE (').
 
@@ -49695,7 +49722,7 @@
 
    <dt>U+0026 AMPERSAND (&amp;)
 
-   <dd>Switch to the <a href="#character7">character reference in attribute
+   <dd>Switch to the <a href="#character8">character reference in attribute
     value state</a>, with no <a href="#additional">additional allowed
     character</a>.
 
@@ -49724,8 +49751,8 @@
     Stay in the <a href="#attribute8">attribute value (unquoted) state</a>.
   </dl>
 
-  <h5 id=character2><span class=secno>8.2.4.13. </span><dfn
-   id=character7>Character reference in attribute value state</dfn></h5>
+  <h5 id=character3><span class=secno>8.2.4.13. </span><dfn
+   id=character8>Character reference in attribute value state</dfn></h5>
 
   <p>Attempt to <a href="#consume">consume a character reference</a>.
 
@@ -50470,8 +50497,8 @@
 
   <p>This section defines how to <dfn id=consume>consume a character
    reference</dfn>. This definition is used when parsing character references
-   <a href="#character6" title="character reference data state">in text</a>
-   and <a href="#character7" title="character reference in attribute value
+   <a href="#character7" title="character reference data state">in text</a>
+   and <a href="#character8" title="character reference in attribute value
    state">in attributes</a>.
 
   <p>The behavior depends on the identity of the next character (the one
@@ -50828,7 +50855,7 @@
     <p>If the last character matched is not a U+003B SEMICOLON (<code
      title="">;</code>), there is a <a href="#parse2">parse error</a>.</p>
 
-    <p>If the character reference is being consumed <a href="#character7"
+    <p>If the character reference is being consumed <a href="#character8"
      title="character reference in attribute value state">as part of an
      attribute</a>, and the last character matched is not a U+003B SEMICOLON
      (<code title="">;</code>), and the next character is in the range U+0030
Received on Friday, 12 September 2008 23:26:25 UTC