- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 23 Jul 2008 08:41:43 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv16085
Modified Files:
Overview.html
Log Message:
Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs) (whatwg r1910)
Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.1098
retrieving revision 1.1099
diff -u -d -r1.1098 -r1.1099
--- Overview.html 23 Jul 2008 07:41:52 -0000 1.1098
+++ Overview.html 23 Jul 2008 08:41:40 -0000 1.1099
@@ -51085,121 +51085,79 @@
is not compatible with the XML tool chain in certain subtle ways. For
example, an XML toolchain might not be able to represent attributes with
the name <code title="">xmlns</code>, since they conflict with the
- Namespaces in XML syntax. <a href="#references">[XMLNS]</a>
-
- <p>There is also some data that the <a href="#html-0">HTML parser</a>
- generates that isn't included in the DOM itself.
-
- <p>To allow tools to apply a consistent set of adjustments to the output of
- their <a href="#html-0">HTML parser</a> to allow for compatibility with
- the rest of their XML toolchain, this section documents a set of mutations
- and conventions that will convert the output of the <a href="#html-0">HTML
- parser</a> for any arbitrary input into an XML Infoset that doesn't have
- any problematic characteristics.
-
- <p>Tools that cannot convey the out-of-band information using out-of-band
- mechanisms, or that cannot convey the DOM exactly as prescribed by this
- specification, may either ignore the offending information or DOM feature,
- or may represent it internally in the DOM using the conventions described
- below.
-
- <p>These conventions are not conforming HTML, and user agents must not
- output such syntax outside of their XML pipeline.
-
- <dl>
- <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code
- title="">publicId</code>, and <code title="">systemId</code> attributes
-
- <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop
- DOCTYPEs altogether or create a set of three attributes on the root
- element, named <code title="">__doctype_name__</code>, <code
- title="">__doctype_publicid__</code>, and <code
- title="">__doctype_systemid__</code>, respectively, whose values are the
- values that would have been put on the <code>DocumentType</code> node.
-
- <dt>The document being set to <i><a href="#no-quirks">no quirks
- mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
- <i><a href="#quirks">quirks mode</a></i>
-
- <dd>To convey this information, create an attribute <code
- title="">__mode__</code> on the root element, with values "noquirks",
- "limitedquirks", or "quirks" respectively.
-
- <dt>Elements that have a namespace without appropriate <code
- title="">xmlns</code> attributes being in scope
-
- <dd>Construct the DOM as if appropriate namespace declarations were in
- scope.
-
- <dt>Elements whose names contain U+003A COLON (:) characters or characters
- that cannot be represented in XML element names
-
- <dd>Drop the element and all its children, or replace any offending
- characters with a U+005F LOW LINE (_) character.
-
- <dt>Attributes named <code title="">xmlns</code> whose values match the
- namespace of the element node
-
- <dd>Construct the DOM as if these were default namespace declarations.
-
- <dt>Attributes named <code title="">xmlns:xlink</code> whose values match
- the <a href="#xlink">XLink namespace</a>, on elements whose namespace is
- not the <a href="#html-namespace0">HTML namespace</a>
-
- <dd>Construct the DOM as if these were namespace prefix declarations.
-
- <dt>Other attributes whose names are <code title="">xmlns</code> or start
- with <code title="">xmlns:</code>
-
- <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the
- start of the attributes' names and replace any U+003A COLON (:)
- characters with a U+005F LOW LINE (_) character.
+ Namespaces in XML syntax. There is also some data that the <a
+ href="#html-0">HTML parser</a> generates that isn't included in the DOM
+ itself. This section specifies some rules for handling these issues.
- <dt>Other attributes in no namespace whose names contain U+003A COLON (:)
- characters
+ <p>If the XML API being used doesn't support DOCTYPEs, tools may drop
+ DOCTYPEs altogether.
- <dt>Attributes whose names contain characters that cannot be represented
- in XML attribute names
+ <p>If the XML API doesn't support attributes in no namespace that are named
+ "<code title="">xmlns</code>", attributes whose names start with "<code
+ title="">xmlns:</code>", or attributes in the <a href="#xmlns">XMLNS
+ namespace</a>, then the tool may drop such attributes.
- <dd>Drop the attributes or replace any offending characters with a U+005F
- LOW LINE (_) character, dropping any attributes where doing this would
- cause an attribute name clash.
+ <p>The tool may annotate the output with any namespace declarations
+ required for proper operation.
- <dt>Form controls associated with forms that aren't their nearest ancestor
- (use of the <a href="#form-element"><code>form</code> element
- pointer</a>)
+ <p>If the XML API being used restricts the allowable characters in the
+ local names of elements and attributes, then the tool may map all element
+ and attribute local names that the API wouldn't support to a set of names
+ that <em>are</em> allowed, by replacing any character that isn't supported
+ with the upper case letter U and the five digits of the character's
+ Unicode codepoint when expressed in hexadecimal.
- <dd>Create an attribute <code title="">__formid__</code> on the form, with
- a value unique amongst <code title="">__formid__</code> attributes in the
- document, and create an attribute <code title="">__form__</code> on the
- form control, whose value matches the unique identifier given to the
- form.
+ <p class=example>For example, the element name <code
+ title="">.foo<bar</code>, which can be output by the <a
+ href="#html-0">HTML parser</a>, though it is neither a legal HTML element
+ name nor a well-formed XML element name, would be converted into <code
+ title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed XML
+ element name (though it's still not legal in HTML by any means).
- <dt>Any U+000C FORM FEED (FF) character
+ <p class=example>As another example, consider the attribute
+ <code>xlink:href</code>. Used on a MathML element, it becomes, after being
+ <span title="adjust foreign attributes</span>, an attribute with a prefix
+ "><code title="">xlink</code>" and a local name "<code
+ title="">href</code>". However, used on an HTML element, it becomes an
+ attribute with no prefix and the local name "<code
+ title="">xlink:href</code>", which is not a valid NCName, and thus might
+ not be accepted by an XML API. It could thus get converted, becoming
+ "<code title="">xlinkU0003Ahref</code>".</span>
- <dd>Replace the character with a U+0020 SPACE character.
+ <p class=note>The resulting names from this conversion conveniently can't
+ clash with any attribute generated by the <a href="#html-0">HTML
+ parser</a>, since those are all either lowercase or those listed in the <a
+ href="#adjust">adjust foreign attributes</a> algorithm's table.
- <dt>Any other literal non-XML character
+ <p>If the XML API restricts comments from having two consecutive U+002D
+ HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
+ character between any such offending characters.
- <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER.
+ <p>If the XML API restricts allowed characters in character data, the tool
+ may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
+ character, and any other literal non-XML character with a U+FFFD
+ REPLACEMENT CHARACTER.
- <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters
- (--).
+ <p>If the tool has no way to convey out-of-band information, then the tool
+ may drop the following information:
- <dd>Insert a U+0020 SPACE character between them.
- </dl>
+ <ul>
+ <li>Whether the document is set to <i><a href="#no-quirks">no quirks
+ mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or
+ <i><a href="#quirks">quirks mode</a></i>
- <p>Tools that use these conventions should guard against documents that
- include markup that clashes with them by always dropping all attributes in
- the document that start with two U+005F LOW LINE (_) characters.
+ <li>The association between form controls and forms that aren't their
+ nearest <code>form</code> element ancestor (use of the <a
+ href="#form-element"><code>form</code> element pointer</a> in the parser)
+ </ul>
- <p class=note>These conventions apply <em>after</em> the <a
- href="#html-0">HTML parser</a>'s rules have been applied. For example, a
- <code title=""><a::></code> start tag will be closed by a <code
- title=""></a::></code> end tag, and never by a <code
- title=""></a__></code> end tag, even if the user agent is using the
- rules above to then generate an actual element in the DOM with the name
- <code title="">a__</code> for that start tag.
+ <p class=note>The mutatiosn allowed by this section apply <em>after</em>
+ the <a href="#html-0">HTML parser</a>'s rules have been applied. For
+ example, a <code title=""><a::></code> start tag will be closed by a
+ <code title=""></a::></code> end tag, and never by a <code
+ title=""></aU0003AU0003A></code> end tag, even if the user agent is
+ using the rules above to then generate an actual element in the DOM with
+ the name <code title="">aU0003AU0003A</code> for that start tag.
<h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>
Received on Wednesday, 23 July 2008 08:42:17 UTC