- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 23 Jul 2008 08:41:43 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv16085 Modified Files: Overview.html Log Message: Make the coercions section not invent a new syntax. (Bug 5808) (credit: hs) (whatwg r1910) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.1098 retrieving revision 1.1099 diff -u -d -r1.1098 -r1.1099 --- Overview.html 23 Jul 2008 07:41:52 -0000 1.1098 +++ Overview.html 23 Jul 2008 08:41:40 -0000 1.1099 @@ -51085,121 +51085,79 @@ is not compatible with the XML tool chain in certain subtle ways. For example, an XML toolchain might not be able to represent attributes with the name <code title="">xmlns</code>, since they conflict with the - Namespaces in XML syntax. <a href="#references">[XMLNS]</a> - - <p>There is also some data that the <a href="#html-0">HTML parser</a> - generates that isn't included in the DOM itself. - - <p>To allow tools to apply a consistent set of adjustments to the output of - their <a href="#html-0">HTML parser</a> to allow for compatibility with - the rest of their XML toolchain, this section documents a set of mutations - and conventions that will convert the output of the <a href="#html-0">HTML - parser</a> for any arbitrary input into an XML Infoset that doesn't have - any problematic characteristics. - - <p>Tools that cannot convey the out-of-band information using out-of-band - mechanisms, or that cannot convey the DOM exactly as prescribed by this - specification, may either ignore the offending information or DOM feature, - or may represent it internally in the DOM using the conventions described - below. - - <p>These conventions are not conforming HTML, and user agents must not - output such syntax outside of their XML pipeline. - - <dl> - <dt>The <code>DocumentType</code> node's <code title="">name</code>, <code - title="">publicId</code>, and <code title="">systemId</code> attributes - - <dd>If the XML API being used doesn't support DOCTYPEs, tools may drop - DOCTYPEs altogether or create a set of three attributes on the root - element, named <code title="">__doctype_name__</code>, <code - title="">__doctype_publicid__</code>, and <code - title="">__doctype_systemid__</code>, respectively, whose values are the - values that would have been put on the <code>DocumentType</code> node. - - <dt>The document being set to <i><a href="#no-quirks">no quirks - mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or - <i><a href="#quirks">quirks mode</a></i> - - <dd>To convey this information, create an attribute <code - title="">__mode__</code> on the root element, with values "noquirks", - "limitedquirks", or "quirks" respectively. - - <dt>Elements that have a namespace without appropriate <code - title="">xmlns</code> attributes being in scope - - <dd>Construct the DOM as if appropriate namespace declarations were in - scope. - - <dt>Elements whose names contain U+003A COLON (:) characters or characters - that cannot be represented in XML element names - - <dd>Drop the element and all its children, or replace any offending - characters with a U+005F LOW LINE (_) character. - - <dt>Attributes named <code title="">xmlns</code> whose values match the - namespace of the element node - - <dd>Construct the DOM as if these were default namespace declarations. - - <dt>Attributes named <code title="">xmlns:xlink</code> whose values match - the <a href="#xlink">XLink namespace</a>, on elements whose namespace is - not the <a href="#html-namespace0">HTML namespace</a> - - <dd>Construct the DOM as if these were namespace prefix declarations. - - <dt>Other attributes whose names are <code title="">xmlns</code> or start - with <code title="">xmlns:</code> - - <dd>Drop the attributes or add two U+005F LOW LINE (_) characters to the - start of the attributes' names and replace any U+003A COLON (:) - characters with a U+005F LOW LINE (_) character. + Namespaces in XML syntax. There is also some data that the <a + href="#html-0">HTML parser</a> generates that isn't included in the DOM + itself. This section specifies some rules for handling these issues. - <dt>Other attributes in no namespace whose names contain U+003A COLON (:) - characters + <p>If the XML API being used doesn't support DOCTYPEs, tools may drop + DOCTYPEs altogether. - <dt>Attributes whose names contain characters that cannot be represented - in XML attribute names + <p>If the XML API doesn't support attributes in no namespace that are named + "<code title="">xmlns</code>", attributes whose names start with "<code + title="">xmlns:</code>", or attributes in the <a href="#xmlns">XMLNS + namespace</a>, then the tool may drop such attributes. - <dd>Drop the attributes or replace any offending characters with a U+005F - LOW LINE (_) character, dropping any attributes where doing this would - cause an attribute name clash. + <p>The tool may annotate the output with any namespace declarations + required for proper operation. - <dt>Form controls associated with forms that aren't their nearest ancestor - (use of the <a href="#form-element"><code>form</code> element - pointer</a>) + <p>If the XML API being used restricts the allowable characters in the + local names of elements and attributes, then the tool may map all element + and attribute local names that the API wouldn't support to a set of names + that <em>are</em> allowed, by replacing any character that isn't supported + with the upper case letter U and the five digits of the character's + Unicode codepoint when expressed in hexadecimal. - <dd>Create an attribute <code title="">__formid__</code> on the form, with - a value unique amongst <code title="">__formid__</code> attributes in the - document, and create an attribute <code title="">__form__</code> on the - form control, whose value matches the unique identifier given to the - form. + <p class=example>For example, the element name <code + title="">.foo<bar</code>, which can be output by the <a + href="#html-0">HTML parser</a>, though it is neither a legal HTML element + name nor a well-formed XML element name, would be converted into <code + title="">U0002EfooU0003Cbar</code>, which <em>is</em> a well-formed XML + element name (though it's still not legal in HTML by any means). - <dt>Any U+000C FORM FEED (FF) character + <p class=example>As another example, consider the attribute + <code>xlink:href</code>. Used on a MathML element, it becomes, after being + <span title="adjust foreign attributes</span>, an attribute with a prefix + "><code title="">xlink</code>" and a local name "<code + title="">href</code>". However, used on an HTML element, it becomes an + attribute with no prefix and the local name "<code + title="">xlink:href</code>", which is not a valid NCName, and thus might + not be accepted by an XML API. It could thus get converted, becoming + "<code title="">xlinkU0003Ahref</code>".</span> - <dd>Replace the character with a U+0020 SPACE character. + <p class=note>The resulting names from this conversion conveniently can't + clash with any attribute generated by the <a href="#html-0">HTML + parser</a>, since those are all either lowercase or those listed in the <a + href="#adjust">adjust foreign attributes</a> algorithm's table. - <dt>Any other literal non-XML character + <p>If the XML API restricts comments from having two consecutive U+002D + HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE + character between any such offending characters. - <dd>Replace the character with a U+FFFD REPLACEMENT CHARACTER. + <p>If the XML API restricts allowed characters in character data, the tool + may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE + character, and any other literal non-XML character with a U+FFFD + REPLACEMENT CHARACTER. - <dt>A comment that contains two adjacent U+002D HYPHEN-MINUS characters - (--). + <p>If the tool has no way to convey out-of-band information, then the tool + may drop the following information: - <dd>Insert a U+0020 SPACE character between them. - </dl> + <ul> + <li>Whether the document is set to <i><a href="#no-quirks">no quirks + mode</a></i>, <i><a href="#limited1">limited quirks mode</a></i>, or + <i><a href="#quirks">quirks mode</a></i> - <p>Tools that use these conventions should guard against documents that - include markup that clashes with them by always dropping all attributes in - the document that start with two U+005F LOW LINE (_) characters. + <li>The association between form controls and forms that aren't their + nearest <code>form</code> element ancestor (use of the <a + href="#form-element"><code>form</code> element pointer</a> in the parser) + </ul> - <p class=note>These conventions apply <em>after</em> the <a - href="#html-0">HTML parser</a>'s rules have been applied. For example, a - <code title=""><a::></code> start tag will be closed by a <code - title=""></a::></code> end tag, and never by a <code - title=""></a__></code> end tag, even if the user agent is using the - rules above to then generate an actual element in the DOM with the name - <code title="">a__</code> for that start tag. + <p class=note>The mutatiosn allowed by this section apply <em>after</em> + the <a href="#html-0">HTML parser</a>'s rules have been applied. For + example, a <code title=""><a::></code> start tag will be closed by a + <code title=""></a::></code> end tag, and never by a <code + title=""></aU0003AU0003A></code> end tag, even if the user agent is + using the rules above to then generate an actual element in the DOM with + the name <code title="">aU0003AU0003A</code> for that start tag. <h3 id=namespaces><span class=secno>8.3 </span>Namespaces</h3>
Received on Wednesday, 23 July 2008 08:42:17 UTC