spec/Overview.html 1.1099 1910 Make the coercions section not invent a

Make the coercions section not invent a new syntax. (Bug 5808) (credit:
hs) (whatwg r1910) (changed by: Ian Hickson)
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5808

Diffs for this change per section: 
  8.2.7 Coercing an HTML DOM into an infoset
  http://people.w3.org/mike/diffs/html5/spec/Overview.1.1099.html#coercing
  8.3 Namespaces
  http://people.w3.org/mike/diffs/html5/spec/Overview.1.1099.html#namespaces

Current content per affected section: 
  http://dev.w3.org/html5/spec/Overview.html#coercing
  http://dev.w3.org/html5/spec/Overview.html#namespaces

Previously published WD content per affected section: 
  http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#coercing
  http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#namespaces

Cumulative diff: http://people.w3.org/mike/diffs/html5/spec/Overview.diff.html

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.1098&r2=1.1099&f=h

http://html5.org/tools/web-apps-tracker?from=1909&to=1910

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.1098
retrieving revision 1.1099
diff -u -d -r1.1098 -r1.1099
--- Overview.html 23 Jul 2008 07:41:52 -0000 1.1098
+++ Overview.html 23 Jul 2008 08:41:40 -0000 1.1099
@@ -51085,121 +51085,79 @@
    is not compatible with the XML tool chain in certain subtle ways. For
    example, an XML toolchain might not be able to represent attributes with
    the name <code title="">xmlns</code>, since they conflict with the
-   Namespaces in XML syntax. <a href="#references">[XMLNS]</a>
-
-  <p>There is also some data that the <a href="#html-0">HTML parser</a>
-   generates that isn't included in the DOM itself.
-
-  <p>To allow tools to apply a consistent set of adjustments to the output of
-   their <a href="#html-0">HTML parser</a> to allow for compatibility with
-   the rest of their XML toolchain, this section documents a set of mutations
-   and conventions that will convert the output of the <a href="#html-0">HTML
-   parser</a> for any arbitrary input into an XML Infoset that doesn't have
-   any problematic characteristics.
-
-  <p>Tools that cannot convey the out-of-band information using out-of-band
-   mechanisms, or that cannot convey the DOM exactly as prescribed by this
-   specification, may either ignore the offending information or DOM feature,
-   or may represent it internally in the DOM using the conventions described
-   below.
-
-  <p>These conventions are not conforming HTML, and user agents must not
-   output such syntax outside of their XML pipeline.
-
-  <dl>
-   <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code
-    title="">publicId</code>, and <code title="">systemId</code> attributes
-
-   <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop
-    DOCTYPEs altogether or create a set of three attributes on the root
-    element, named <code title="">__doctype_name__</code>, <code
-    title="">__doctype_publicid__</code>, and <code
-    title="">__doctype_systemid__</code>, respectively, whose values are the
-    values that would have been put on the <code>DocumentType</code> node.
-
-   <dt>The document being set to <i><a href="#no-quirks">no quirks
-    mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
-    <i><a href="#quirks">quirks mode</a></i>
-
-   <dd>To convey this information, create an attribute <code
-    title="">__mode__</code> on the root element, with values "noquirks",
-    "limitedquirks", or "quirks" respectively.
-
-   <dt>Elements that have a namespace without appropriate <code
-    title="">xmlns</code> attributes being in scope
-
-   <dd>Construct the DOM as if appropriate namespace declarations were in
-    scope.
-
-   <dt>Elements whose names contain U+003A COLON (:) characters or characters
-    that cannot be represented in XML element names
-
-   <dd>Drop the element and all its children, or replace any offending
-    characters with a U+005F LOW LINE (_) character.
-
-   <dt>Attributes named <code title="">xmlns</code> whose values match the
-    namespace of the element node
-
-   <dd>Construct the DOM as if these were default namespace declarations.
-
-   <dt>Attributes named <code title="">xmlns:xlink</code> whose values match
-    the <a href="#xlink">XLink namespace</a>, on elements whose namespace is
-    not the <a href="#html-namespace0">HTML namespace</a>
-
-   <dd>Construct the DOM as if these were namespace prefix declarations.
-
-   <dt>Other attributes whose names are <code title="">xmlns</code> or start
-    with <code title="">xmlns:</code>
-
-   <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the
-    start of the attributes' names and replace any U+003A COLON (:)
-    characters with a U+005F LOW LINE (_) character.
+   Namespaces in XML syntax. There is also some data that the <a
+   href="#html-0">HTML parser</a> generates that isn't included in the DOM
+   itself. This section specifies some rules for handling these issues.
 
-   <dt>Other attributes in no namespace whose names contain U+003A COLON (:)
-    characters
+  <p>If the XML API being used doesn't support DOCTYPEs, tools may drop
+   DOCTYPEs altogether.
 
-   <dt>Attributes whose names contain characters that cannot be represented
-    in XML attribute names
+  <p>If the XML API doesn't support attributes in no namespace that are named
+   "<code title="">xmlns</code>", attributes whose names start with "<code
+   title="">xmlns:</code>", or attributes in the <a href="#xmlns">XMLNS
+   namespace</a>, then the tool may drop such attributes.
 
-   <dd>Drop the attributes or replace any offending characters with a U+005F
-    LOW LINE (_) character, dropping any attributes where doing this would
-    cause an attribute name clash.
+  <p>The tool may annotate the output with any namespace declarations
+   required for proper operation.
 
-   <dt>Form controls associated with forms that aren't their nearest ancestor
-    (use of the <a href="#form-element"><code>form</code> element
-    pointer</a>)
+  <p>If the XML API being used restricts the allowable characters in the
+   local names of elements and attributes, then the tool may map all element
+   and attribute local names that the API wouldn't support to a set of names
+   that <em>are</em> allowed, by replacing any character that isn't supported
+   with the upper case letter U and the five digits of the character's
+   Unicode codepoint when expressed in hexadecimal.
 
-   <dd>Create an attribute <code title="">__formid__</code> on the form, with
-    a value unique amongst <code title="">__formid__</code> attributes in the
-    document, and create an attribute <code title="">__form__</code> on the
-    form control, whose value matches the unique identifier given to the
-    form.
+  <p class=example>For example, the element name <code
+   title="">.foo&lt;bar</code>, which can be output by the <a
+   href="#html-0">HTML parser</a>, though it is neither a legal HTML element
+   name nor a well-formed XML element name, would be converted into <code
+   title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed XML
+   element name (though it's still not legal in HTML by any means).
 
-   <dt>Any U+000C FORM FEED (FF) character
+  <p class=example>As another example, consider the attribute
+   <code>xlink:href</code>. Used on a MathML element, it becomes, after being
+   <span title="adjust foreign attributes</span>, an attribute with a prefix
+   "><code title="">xlink</code>" and a local name "<code
+   title="">href</code>". However, used on an HTML element, it becomes an
+   attribute with no prefix and the local name "<code
+   title="">xlink:href</code>", which is not a valid NCName, and thus might
+   not be accepted by an XML API. It could thus get converted, becoming
+   "<code title="">xlinkU0003Ahref</code>".</span>
 
-   <dd>Replace the character with a U+0020 SPACE character.
+  <p class=note>The resulting names from this conversion conveniently can't
+   clash with any attribute generated by the <a href="#html-0">HTML
+   parser</a>, since those are all either lowercase or those listed in the <a
+   href="#adjust">adjust foreign attributes</a> algorithm's table.
 
-   <dt>Any other literal non-XML character
+  <p>If the XML API restricts comments from having two consecutive U+002D
+   HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
+   character between any such offending characters.
 
-   <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.
+  <p>If the XML API restricts allowed characters in character data, the tool
+   may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
+   character, and any other literal non-XML character with a U+FFFD
+   REPLACEMENT CHARACTER.
 
-   <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters
-    (--).
+  <p>If the tool has no way to convey out-of-band information, then the tool
+   may drop the following information:
 
-   <dd>Insert a U+0020 SPACE character between them.
-  </dl>
+  <ul>
+   <li>Whether the document is set to <i><a href="#no-quirks">no quirks
+    mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
+    <i><a href="#quirks">quirks mode</a></i>
 
-  <p>Tools that use these conventions should guard against documents that
-   include markup that clashes with them by always dropping all attributes in
-   the document that start with two U+005F LOW LINE (_) characters.
+   <li>The association between form controls and forms that aren't their
+    nearest <code>form</code> element ancestor (use of the <a
+    href="#form-element"><code>form</code> element pointer</a> in the parser)
+  </ul>
 
-  <p class=note>These conventions apply <em>after</em> the <a
-   href="#html-0">HTML parser</a>'s rules have been applied. For example, a
-   <code title="">&lt;a::></code> start tag will be closed by a <code
-   title="">&lt;/a::></code> end tag, and never by a <code
-   title="">&lt;/a__></code> end tag, even if the user agent is using the
-   rules above to then generate an actual element in the DOM with the name
-   <code title="">a__</code> for that start tag.
+  <p class=note>The mutatiosn allowed by this section apply <em>after</em>
+   the <a href="#html-0">HTML parser</a>'s rules have been applied. For
+   example, a <code title="">&lt;a::></code> start tag will be closed by a
+   <code title="">&lt;/a::></code> end tag, and never by a <code
+   title="">&lt;/aU0003AU0003A></code> end tag, even if the user agent is
+   using the rules above to then generate an actual element in the DOM with
+   the name <code title="">aU0003AU0003A</code> for that start tag.
 
   <h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>

Received on Wednesday, 23 July 2008 11:03:35 UTC