- From: CVS User egraff <cvsmail@w3.org>
- Date: Wed, 08 Jan 2014 00:44:58 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-polyglot In directory roscoe:/tmp/cvs-serv7358 Modified Files: html-polyglot.html Log Message: More language edits, up to but not including 4.6.2.1 The safe text content option --- /sources/public/html5/html-polyglot/html-polyglot.html 2014/01/07 22:41:46 1.20 +++ /sources/public/html5/html-polyglot/html-polyglot.html 2014/01/08 00:44:58 1.21 @@ -218,7 +218,7 @@ <section id="PI-and-xml" class="section"> <h3>Processing instructions and the XML declaration</h3> <p> - Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>. + Processing instructions and the XML declaration are both forbidden in <a>polyglot markup</a>. </p> <!--End section: Processing Instructions and the XML Declaration--> </section> @@ -226,13 +226,14 @@ <h3>Specifying a document’s character encoding</h3> <p> <a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. - HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]]. - For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. - As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]]. + HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a>. [[!HTML5]] + </p> + <p> For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. + As such, character encoding MAY be left undeclared in XML with the result that UTF8 is still supported [[!XML10]]. </p> <p> <a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or - in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>): + in combination (but note that there can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>): </p> <ul> <li>Within the document @@ -316,7 +317,7 @@ <p> [[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>, and the root MathML element, <code>math</code>. - <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML-compatibility [[!XML10]]:</p> + <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML compatibility [[!XML10]]:</p> <ul class="inline-list"> <li><code><html xmlns="http://www.w3.org/1999/xhtml"></code></li> <li><code><math xmlns="http://www.w3.org/1998/Math/MathML"></code></li> @@ -354,13 +355,13 @@ </ul> <p> Note that there are other prefixed attributes that can be used beyond <code>xlink:href</code> (such as <code>xml:base</code>). - <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via xmlns. The prefixes are implicitly declared + <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via <code>xmlns</code>. The prefixes are implicitly declared in XML and are automatically applied to the appropriate attributes in HTML. </p> <p> The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML. - Thus, the rules for how they can be sued as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] - For more on the issues related to attribute selectors and namespaces, with and without prefix, see the section on <a + Thus, the rules for how they can be used as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] + For more about the issues related to attribute selectors and namespaces, with and without prefixes, see the section on <a href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>. <p> @@ -374,19 +375,21 @@ <section id="required-elements" class="section"> <h6>Required elements and tags</h6> - <p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a - href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the - HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for - them. However, since XML does not have a feature whereby elements with one or both tags that have been - omitted from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM, - omitting a tag in <a>polyglot markup</a> is equivalent of producing a not well-formed document or, - if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not - operate with <a>optional tags</a>.</p> - - <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used - to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the - <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot - markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p> + <p> <a title="polyglot markup">Polyglot markup</a> does not employ <a>optional tags</a>. + HTML5’s concept of <dfn>optional tags</dfn> – missing start tags and/or end tags – covers + <a href="http://www.w3.org/TR/html5/syntax.html#optional-tags"> + elements that the HTML parser itself automatically adds to the DOM</a> + if the code doesn’t contain the tags for them. + Because XML does not have such a feature that adds missing start and/or end tags to the DOM, + omitting a tag in <a>polyglot markup</a> is equivalent to producing a document that is not well-formed or, + if both tags are omitted, equivalent to not adding the element at all. </p> + + <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises for an author not used + to adding the <code>tbody</code> tags in their code, for example, + or to someone accustomed to omitting the end tag of the <code>p</code> element. + However, the requirement to be well-formed with regard to tags is a key feature of <a>polyglot markup</a> + that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises. + </p> <section id="minimal-polyglot-html-document"> <h4>A minimal HTML document</h4> <p> @@ -645,7 +648,7 @@ <a>polyglot markup</a> uses both the <code>lang</code> and the <code>xml:lang attributes</code> (see <a href="#language-attributes">Language attributes</a>); however, the <a href="http://www.w3.org/TR/css3-selectors/#lang-pseudo">CSS3 Selectors specification</a> stipulates that - language attributes, including <code>xml:lang</code>, are matched in a case-insensitive way. [[!SELECT]] + language attributes, including <code>xml:lang</code>, are matched in a case insensitive way. [[!SELECT]] </p> <!--End section: Attribute values--> </section> @@ -704,15 +707,17 @@ </figure> <p> - In the HTML syntax, the contents of raw text elements is raw text, by which it is referred to the fact - that the HTML parser will not treat contained code that look like tags (element tags and comment tags), character references, - CDATA etc as tags, character references, CDATA etc, but as raw text. (See HTML5 for the exact rules.) + In HTML syntax, the content of raw text elements is raw text. + In other words, the HTML parser does not treat contained code that looks like tags (element tags and comment tags, + character references, CDATA, etc.) as tags, character references, CDATA, etc., but as raw text. + (See HTML5 for the exact rules.) In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc. </p> - <p>As result, in HTML, it is simpler than it is in XHTML, for authors to comply with the requirement of the default MIME - types of the raw text elements. On the other side, by the use of <code class="CDATA">CDATA</code>, the raw text contents - parsed as XHTML, can be made ven less semantic than the raw text data of HTML, leading to potential harms if the document - is parsed as HTML + <p>As result, it is simpler for authors to comply with the requirement of the default MIME + types of the raw text elements in HTML than it is in XHTML. + On the other hand, with <code class="CDATA">CDATA</code>, the raw text contents + parsed as XHTML can be made even less semantic than the raw text data of HTML, + leading to potential harms if the document is parsed as HTML. </p> <figure id="ambiguous-table"> @@ -740,9 +745,9 @@ <tr><td><code>cdata content</code></td><td>the content of CDATA sections</td><td></td><td>uninterpreted</td><td>—</td></tr> <tr><td><code></script</code> </td><td>if occuring inside <code>script</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> <tr><td><code></style</code></td><td>if occuring inside <code>style</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> - <tr><td><code><foo></bar></code></td><td>all other tags, wellformed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> + <tr><td><code><foo></bar></code></td><td>all other tags, well-formed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> <tr><td><code>&#foo;</code></td><td>character references</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> </tbody> <tbody> - <tr><th><code>none of the above strings</code></th><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr> + <tr><td><code>none of the above strings</code></td><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr> </tbody> </table> </figure> @@ -750,7 +755,7 @@ <p>Syntactically, the polyglot subset is found by</p> <ul><li><em>either</em> <strong>limiting the content to <dfn>safe content</dfn></strong>, that - is: text that gets interpreted the same way in HTML and in XML.</li> + is, text that gets interpreted the same way in HTML and in XML.</li> <li><em>or</em> trying to <strong>even out the constraints differences</strong> by wrapping the contents in a <code>CDATA</code> section. The <code>CDATA</code> code is then seen as text by the HTML parser (and can thus interfere with the scripting or styling language!), while the XML parser sees the
Received on Wednesday, 8 January 2014 00:44:59 UTC