- From: poot <cvsmail@w3.org>
- Date: Wed, 23 Jul 2008 11:05:58 +0900 (JST)
- To: public-html-diffs@w3.org
Provide a way to mutate the DOM into an infoset. (Bug 5808) (credit: hs)
(whatwg r1907) (changed by: Ian Hickson)
Diffs for this change per section:
HTML namespace
http://people.w3.org/mike/diffs/html5/spec/Overview.1.1096.html#html-namespace0
delays the load event
http://people.w3.org/mike/diffs/html5/spec/Overview.1.1096.html#delays
8.2.7 Coercing an HTML DOM into an infoset
http://people.w3.org/mike/diffs/html5/spec/Overview.1.1096.html#coercing
8.3 Namespaces
http://people.w3.org/mike/diffs/html5/spec/Overview.1.1096.html#namespaces
Current content per affected section:
http://dev.w3.org/html5/spec/Overview.html#html-namespace0
http://dev.w3.org/html5/spec/Overview.html#delays
http://dev.w3.org/html5/spec/Overview.html#coercing
http://dev.w3.org/html5/spec/Overview.html#namespaces
Previously published WD content per affected section:
http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#html-namespace0
http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#delays
http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#coercing
http://www.w3.org/TR/2008/WD-html5-20080610/single-page/#namespaces
Cumulative diff: http://people.w3.org/mike/diffs/html5/spec/Overview.diff.html
http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.1095&r2=1.1096&f=h
http://html5.org/tools/web-apps-tracker?from=1906&to=1907
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.1095
retrieving revision 1.1096
diff -u -d -r1.1095 -r1.1096
--- Overview.html 23 Jul 2008 01:05:20 -0000 1.1095
+++ Overview.html 23 Jul 2008 02:02:55 -0000 1.1096
@@ -2018,6 +2018,9 @@
</ul>
<li><a href="#the-end"><span class=secno>8.2.6 </span>The end</a>
+
+ <li><a href="#coercing"><span class=secno>8.2.7 </span>Coercing an
+ HTML DOM into an infoset</a>
</ul>
<li><a href="#namespaces"><span class=secno>8.3 </span>Namespaces</a>
@@ -51069,6 +51072,130 @@
/parser/htmlparser/src/nsElementTable.cpp, line 1901 - // Ex: <H1><LI><H1><LI>. Inner LI has the potential of getting nested
-->
+ <h4 id=coercing><span class=secno>8.2.7 </span>Coercing an HTML DOM into an
+ infoset</h4>
+
+ <p>When an application uses an <a href="#html-0">HTML parser</a> in
+ conjunction with an XML pipeline, it is possible that the constructed DOM
+ is not compatible with the XML tool chain in certain subtle ways. For
+ example, an XML toolchain might not be able to represent attributes with
+ the name <code title="">xmlns</code>, since they conflict with the
+ Namespaces in XML syntax. <a href="#references">[XMLNS]</a>
+
+ <p>There is also some data that the <a href="#html-0">HTML parser</a>
+ generates that isn't included in the DOM itself.
+
+ <p>To allow tools to apply a consistent set of adjustments to the output of
+ their <a href="#html-0">HTML parser</a> to allow for compatibility with
+ the rest of their XML toolchain, this section documents a set of mutations
+ and conventions that will convert the output of the <a href="#html-0">HTML
+ parser</a> for any arbitrary input into an XML Infoset that doesn't have
+ any problematic characteristics.
+
+ <p>Tools that cannot convey the out-of-band information using out-of-band
+ mechanisms, or that cannot convey the DOM exact as prescribed by this
+ specification, may either ignore the offending information or DOM feature,
+ or may represent it internally in the DOM using the conventions described
+ below.
+
+ <p>These conventions are not conforming HTML, and user agents must not
+ output such syntax outside of their XML pipeline.
+
+ <dl>
+ <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code
+ title="">publicId</code>, and <code title="">systemId</code> attributes
+
+ <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop
+ DOCTYPEs altogether or create a set of three attributes on the root
+ element, named <code title="">__doctype_name__</code>, <code
+ title="">__doctype_publicid__</code>, and <code
+ title="">__doctype_systemid__</code>, respectively, whose values are the
+ values that would have been put on the <code>DocumentType</code> node.
+
+ <dt>The document being set to <i><a href="#no-quirks">no quirks
+ mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
+ <i><a href="#quirks">quirks mode</a></i>
+
+ <dd>To convey this information, create an attribute <code
+ title="">__mode__</code> on the root element, with values "noquirks",
+ "limitedquirks", or "quirks" respectively.
+
+ <dt>Elements that have a namespace without appropriate <code
+ title="">xmlns</code> attributes being in scope
+
+ <dd>Construct the DOM as if appropriate namespace declarations were in
+ scope.
+
+ <dt>Elements whose names contain U+003A COLON (:) characters or characters
+ that cannot be represented in XML element names
+
+ <dd>Drop the element and all its children, or replace any offending
+ characters with a U+005F LOW LINE (_) character.
+
+ <dt>Attributes named <code title="">xmlns</code> whose values match the
+ namespace of the element node
+
+ <dd>Construct the DOM as if these were default namespace declarations.
+
+ <dt>Attributes named <code title="">xmlns:xlink</code> whose values match
+ the <a href="#xlink">XLink namespace</a>, on elements whose namespace is
+ not the <a href="#html-namespace0">HTML namespace</a>
+
+ <dd>Construct the DOM as if these were namespace prefix declarations.
+
+ <dt>Other attributes whose names are <code title="">xmlns</code> or start
+ with <code title="">xmlns:</code>
+
+ <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the
+ start of the attributes' names and replace any U+003A COLON (:)
+ characters with a U+005F LOW LINE (_) character.
+
+ <dt>Other attributes in no namespace whose names contain U+003A COLON (:)
+ characters
+
+ <dt>Attributes whose names contain characters that cannot be represented
+ in XML attribute names
+
+ <dd>Drop the attributes or replace any offending characters with a U+005F
+ LOW LINE (_) character, dropping any attributes where doing this would
+ cause an attribute name clash.
+
+ <dt>Form controls being associated with forms that aren't their nearest
+ ancestor (use of the <a href="#form-element"><code>form</code> element
+ pointer</a>
+
+ <dd>Create an attribute <code title="">__formid__</code> on the form, with
+ a value unique amongst <code title="">__formid__</code> attributes in the
+ document, and create an attribute <code title="">__form__</code> on the
+ form control, whose value matches the unique identifier given to the
+ form.
+
+ <dt>Any U+000C FORM FEED (FF) character
+
+ <dd>Replace the character with a U+0020 SPACE character.
+
+ <dt>Any other literal non-XML character
+
+ <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.
+
+ <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters
+ (--).
+
+ <dd>Insert a U+0020 SPACE character between them.
+ </dl>
+
+ <p>Tools that use these conventions should guard against documents that
+ include markup that clashes with them by always dropping all attributes in
+ the document that start with two U+005F LOW LINE (_) characters.
+
+ <p class=note>These conventions apply <em>after</em> the <a
+ href="#html-0">HTML parser</a>'s rules have been applied. For example, a
+ <code title=""><a::></code> start tag will be closed by a <code
+ title=""></a::></code> end tag, and never by a <code
+ title=""></a__></code> end tag, even if the user agent is using the
+ rules above to then generate an actual element in the DOM with the name
+ <code title="">a__</code> for that start tag.
+
<h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>
<p>The <dfn id=html-namespace0>HTML namespace</dfn> is:
Received on Wednesday, 23 July 2008 02:06:35 UTC