- From: CVS User egraff <cvsmail@w3.org>
- Date: Thu, 16 Jan 2014 16:40:53 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-polyglot In directory roscoe:/tmp/cvs-serv28373 Modified Files: WD-html-polyglot-20140117.html Log Message: Updated with content from the latest ED of the spec, version published 16 January. --- /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html 2014/01/13 00:10:17 1.3 +++ /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html 2014/01/16 16:40:53 1.4 @@ -46,10 +46,10 @@ </section> <section id="sotd"> <p> - This specification summarizes design guidelines for authors who wish their XHTML or HTML documents to be conforming whether parsed as HTML or as XML. - The document is intended to be useful to web authors, in particular those who want to serve receivers without concern for whether they have XML or HTML parsers available. - Such concerns may, for instance, arise in content syndication or when receivers are on legacy systems. - HTML polyglots <a href="http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#charset-0">facilitate migration to and from XHTML</a>, + This specification summarizes design guidelines for authors who wish their XHTML or HTML documents to be conforming whether parsed as HTML or as XML. + The document is intended to be useful to web authors, in particular those who want to serve receivers without concern for whether they have XML or HTML parsers available. + Such concerns may, for instance, arise in content syndication or when receivers are on legacy systems. + HTML polyglots <a href="http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#charset-0">facilitate migration to and from XHTML</a>, including transition from legacy XML to HTML5, and this document serves to accurately specify the requirements of a UTF-8 based profile for such documents. </p> <p> @@ -71,7 +71,7 @@ </p> <!--End section: Status of This Document--> </section> - <section id="conformance"></section> +<section id="conformance"></section> <!-- note: for principle section In <a>polyglot markup</a>, the strings that XML and HTML interpret differently are considered <dfn>ambiguous strings</dfn> and MUST NOT be used except when they are explicitly permitted @@ -90,9 +90,9 @@ <!--end general--> <section id="scope"> <h3>Scope</h3> - <p>Polylglot markup is a <em><a title="robustness">robust</a></em> – but entirely <em>optional</em> – profile of the HTML vocabulary. + <p>Polylglot markup is a <em><a href="#dfn-robust-syntax">robust</a></em> – but entirely <em>optional</em> – profile of the HTML vocabulary. All web content need not be authored in <a>polyglot markup</a> and it is primarily an option - for authors wanting to increase the <a title="robustness">robustness</a> of their documents. + for authors wanting increased <a href="#dfn-robust-syntax">robustness</a> of their documents. <a title="polyglot markup">Polyglot markup</a> works best, and can be a beneficial option, in controlled environments and for authoring tools.</p> <p><a title="polyglot markup">Polyglot markup</a> is ideal for publishing when there's a strong desire to serve both HTML and XML tool chains without simultaneously having to maintain dual copies of the content: one in HTML and a second in XHTML. @@ -108,72 +108,22 @@ <section id="robust"> <h3>Robustness</h3> - <p> Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. - <a title="polyglot markup">Polyglot markup</a> embraces the principle of <a>robustness</a> - as it is defined in Web Content Accessibility Guidelines (WCAG) 2.0: - <em>Content must be robust enough that it can be interpreted reliably by a wide variety of user agents, - including assistive technologies.</em> [[!WCAG20]] - </p> - <p><a title="robustness">Robustness</a> is not a goal in itself, nor do authors need - to understand the benefits of <a>robustness</a> in order to use and benefit from the syntax of polyglot markup. - Nor does anyone need to exaggerate the benefits of polyglot markup. - For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. - Polyglot markup does, however, work to <em>preserve</em> semantics, including during the authoring process. - Polyglot markup does not ensure accessibility,as it does not add any accessibility requirements - that other relevant specifications have not already added. - But <a>polyglot markup</a> can work to <em>preserve</em> accessibility through adherence to required practices.</p> - - <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> is support for <a title="robustness">robustness</a>. - With <a title="robustness">robust</a> (sometimes known as conservative) markup, authors can - <q cite="http://www.w3.org/TR/WCAG20/#robust">maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p> - - <p>Polyglot markup approaches <a>robustness</a> by defining constraints on the serialization of a DOM tree in a manner that - is likely to retain semantics when that serialization is reparsed using a variety of parsers, be - they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers. - </p> - - <p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that - HTML and XHTML dictate, many of which took "polyglotness" into consideration when they were added to HTML5. - However, for reasons of <a title="robustness">robustness</a>, this specification sometimes goes a little further than the principle of the lowest common - denominator would have required.</p> - - <p> For instance, included in the set of constraints on the serialization is the requirement to use the UTF-8 encoding. - This requirement is not only because of the documented benefits - that in turn have lead the HTML5 specification to recommend that all new documents use UTF-8, - but also because it is the sole encoding that <em>every</em> parser, - be it an HTML parser or an XML parser, is required to support. - Note that the HTML-specific benefits are described in HTML5 [[!HTML5]]. - </p> - <p> Also, UTF-8 might in some situations be the sole <em>HTML-conforming</em> option, since it is one of - only two encodings (the other being UTF-16, with its own, separate set of well-known issues) for which XML well-formed - rules doesn’t require the encoding to be explicitly declared. - This in turn has the benefit that any HTML-invalid XML - encoding declaration can reliably be skipped without causing any side-effects. - For example, if someone opted to use the <code>KOI8-R</code>, - encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, - the author would be forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) - in order to support XML parsers. - By requiring UTF-8, this side-effect is avoided. - And so, while not the only theoretical possibility, - the choice of UTF-8 as the sole option is justified by the underlying principle of <a title="robustness">robustness</a>. - </p> - - <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers. - But even if the document can be expected to be parsed and validated by tools that fully conform to HTML5, - <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>. - As an example, when serialized as HTML, the closing tag for - the <code>p</code> element is entirely optional and will be inferred if not present. But inclusion of - closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, - cause no harm beyond a minor increase in transfer size (an increase often mitigated by compression), but does - allow validators to detect situations where the implicit closing rules - don't match what the author intended. - </p> +<p>The goal of <a title="polyglot markup">polyglot markup</a> is a syntax that is <a href="#dfn-robust-syntax">robust</a> the way the Web Content Accessibility Guidelines (WCAG) 2.0 describes it: ”<q cite="http://www.w3.org/TR/WCAG20/#ensure-compat">Maximize compatibility with current and future user agents, including assistive technologies.</q> [[WCAG20]] </p> + +<p>Authors need not understand the benefits of <a href="#dfn-robust-syntax">robustness</a> in order to benefit from the syntax of polyglot markup. However, in order to promote its benefits, it is necessary to understand that <a title="polyglot markup">polyglot markup</a> does not add semantics, and as such is not any more or less semantic than other flavors of HTML. Polyglot markup does, however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup also does not ensure accessibility,as it does not add any accessibility requirements that other relevant specifications have not already added. But <a>polyglot markup</a> can work to <em>preserve</em> accessibility through adherence to required practices.</p> + +<p>Polyglot markup approaches <a href="#dfn-robust-syntax">robustness</a> by defining constraints on the serialization of a DOM tree in a manner that is likely to retain semantics when that serialization is reparsed using a variety of parsers, be they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers.</p> + +<p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that HTML and XHTML each dictate, many of which took "polyglotness" into consideration when they were added to HTML5. However, for reasons of <a href="#dfn-robust-syntax">robustness</a>, this specification sometimes goes further than the principle of the lowest common denominator would have required.</p> + +<p> For instance, included in the set of constraints on the serialization is the requirement to use the UTF-8 encoding. While not the only theoretical possibility, the choice of UTF-8 as the sole option is justified by the underlying principle of <a href="#dfn-robust-syntax">robustness</a>. E.g. if someone opted to use the <code>KOI8-R</code>, encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, the author would be forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) in order to support XML parsers. By requiring UTF-8, that side-effect is avoided.</p> + +<p>Using <a href="#dfn-robust-syntax">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers. But even if the document can be expected to be parsed and validated by tools that fully conform to HTML5, <a title="polyglot markup">polyglot markup</a> adds <a href="#dfn-robust-syntax">robustness</a>. As an example, when serialized as HTML, the closing tag for the <code>p</code> element is entirely optional and will be inferred if not present. But inclusion of closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, cause no harm beyond a minor increase in transfer size (an increase often mitigated by compression), but does allow validators to detect situations where the implicit closing rules don't match what the author intended. </p> + <p class="note"> - Polyglot markup is not defined as "robust markup" because the XML-based polyglot markup - syntax is not the only way to increase <a title="robustness">robustness</a>. + Note that XML-based polyglot markup syntax is not the only way to increase <a href="#dfn-robust-syntax">robustness</a>. For instance, an HTML validator or an authoring tool could require all tags to be closed even if - this is not required by the HTML syntax. But then again, <a title="polyglot markup">polyglot markup</a>, being valid - XML, has some sometimes practical benefits which such a custom setup alone would not have. + this is not required by the HTML syntax. </p> </section> <!--end robust--> @@ -184,17 +134,24 @@ <h2>Syntax</h2> <section id="principles"><h3>Principles</h3> <p> - <dfn>Polyglot markup</dfn> results in: + <dfn id="dfn-polyglot-markup">Polyglot markup</dfn> results in: </p> <ul> <li>a valid HTML document. [[!HTML5]]</li> <li>a <a href="http://www.w3.org/TR/2008/PER-xml-20080205/#sec-well-formed">well-formed XML</a> document. [[!XML10]]</li> - <li>identical DOMs when processed as HTML and when processed as XML, with some notable exceptions: HTML and XML parsers generate different DOMs for some - <code>xml</code> (<code>xml:lang</code>, <code>xml:space</code>, and <code>xml:base</code>), - <code>xmlns</code> (<code>xmlns=""</code> and <code>xmlns:xlink=""</code>), and <code>xlink</code> (such as <code>xlink:href</code>) attributes. - XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers. The exception must not break the requirement to be a valid HTML document. + <li>identical DOMs when processed as HTML and when processed as XML, with some notable exceptions: HTML and XML parsers generate different DOMs for some <code>xml</code> (<code>xml:lang</code>, <code>xml:space</code>, and <code>xml:base</code>), <code>xmlns</code> (<code>xmlns=""</code> and <code>xmlns:xlink=""</code>), and <code>xlink</code> (such as <code>xlink:href</code>) attributes. XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers. The exception must not break the requirement to be a valid HTML document. </li> </ul> + <p><a>Polyglot Markup</a> specifies a <dfn id="dfn-robust-syntax">Robust Syntax</dfn>, by which it is meant a syntax that maximizes support and minimizes authoring choice.</p> +<p>Support is maximized:</p> + <ul> + <li>by supporting both HTML and XML parsing;</li> + <li>by utilizing code that, as far as possible, results in DOM equivalent parsing in generic as well as specialized parsers, including challenged parsers of various kinds;</li> + <li>because the code is ready to be reused/repurposed/redited/reparsed in any authoring tool or parser.</li></ul> +<p>Auhoring choices are minimized</p> +<ul><li>through strict syntax requirements partly dictated by the polyglot approach and partly motivated by the robust approach.</li> + </ul> + <p> <a title="polyglot markup">Polyglot markup</a> is not constrained: </p> @@ -221,20 +178,20 @@ <!--End section: principles--> </section> <section id="writing"><h2>Writing HTML documents</h2> - <section id="PI-and-xml" class="section"> +<section id="PI-and-xml" class="section"> <h3>Processing instructions and the XML declaration</h3> <p> Processing instructions and the XML declaration are both forbidden in <a>polyglot markup</a>. </p> <!--End section: Processing Instructions and the XML Declaration--> </section> - <section id="character-encoding" class="section"> +<section id="character-encoding" class="section"> <h3>Specifying a document’s character encoding</h3> <p> <a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a>. [[!HTML5]] </p> - <p> For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. + <p> For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. As such, character encoding MAY be left undeclared in XML with the result that UTF8 is still supported [[!XML10]]. </p> <p> @@ -281,7 +238,7 @@ </p> <!--End section: Specifying a Document's Character Encoding--> </section> - <section id="doctype" class="section"> +<section id="doctype" class="section"> <h3>The DOCTYPE</h3> <p> <a title="polyglot markup">Polyglot markup</a> uses a document type declaration (DOCTYPE) specified by <a href="http://www.w3.org/TR/html5/syntax.html#the-doctype">section 8.1.1</a> of [[!HTML5]]. @@ -312,7 +269,7 @@ </p> <!--End section: The DOCTYPE--> </section> - <section id="namespaces" class="section"> +<section id="namespaces" class="section"> <h3>Namespaces</h3> <p> The following rules apply to namespaces used in <a>polyglot markup</a>. @@ -368,33 +325,33 @@ The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML. Thus, the rules for how they can be used as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] For more about the issues related to attribute selectors and namespaces, with and without prefixes, see the section on <a - href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>. - <p> + href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>. + </p> - <!-- End section, "Attribute-Level Namespaces" --> + <!-- End section, "Attribute-Level Namespaces" --> </section> <!--End section: Namespaces--> </section> - <section id="elements" class="section"> +<section id="elements" class="section"> <h3>Element syntax</h3> <p><a title="polyglot markup">Polyglot markup</a> conforms to the following rules regarding elements.</p> - <section id="required-elements" class="section"> +<section id="required-elements" class="section"> <h6>Required elements and tags</h6> <p> <a title="polyglot markup">Polyglot markup</a> does not employ <a>optional tags</a>. - HTML5’s concept of <dfn>optional tags</dfn> – missing start tags and/or end tags – covers + HTML5’s concept of <dfn>optional tags</dfn> – missing start tags and/or end tags – covers <a href="http://www.w3.org/TR/html5/syntax.html#optional-tags"> - elements that the HTML parser itself automatically adds to the DOM</a> - if the code doesn’t contain the tags for them. - Because XML does not have such a feature that adds missing start and/or end tags to the DOM, + elements that the HTML parser itself automatically adds to the DOM</a> + if the code doesn’t contain the tags for them. + Because XML does not have such a feature that adds missing start and/or end tags to the DOM, omitting a tag in <a>polyglot markup</a> is equivalent to producing a document that is not well-formed or, if both tags are omitted, equivalent to not adding the element at all. </p> <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises for an author not used - to adding the <code>tbody</code> tags in their code, for example, - or to someone accustomed to omitting the end tag of the <code>p</code> element. - However, the requirement to be well-formed with regard to tags is a key feature of <a>polyglot markup</a> - that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises. + to adding the <code>tbody</code> tags in their code, for example, + or to someone accustomed to omitting the end tag of the <code>p</code> element. + However, the requirement to be well-formed with regard to tags is a key feature of <a>polyglot markup</a> + that makes the code <a href="#dfn-robust-syntax">robust</a> against subpar parsers and authoring surprises. </p> <section id="minimal-polyglot-html-document"> <h4>A minimal HTML document</h4> @@ -452,7 +409,7 @@ </section> </section> - <section id="excluded-elements" class="section"> +<section id="excluded-elements" class="section"> <h3>Excluded elements and tags</h3> <p> @@ -466,7 +423,7 @@ </p> <!--End section: Elements that Cannot Be Used in Polyglot Markup--> </section> - <section id="case-sensitivity" class="section"> +<section id="case-sensitivity" class="section"> <h3>Case-sensitivity</h3> <p> The following apply to any usage of element names, attribute names, or attribute values in markup, script, or CSS. @@ -650,22 +607,22 @@ Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. </p> <p> - Also note that because XML processors don't recognize <code>lang</code> as containing language information, - <a>polyglot markup</a> uses both the <code>lang</code> and the <code>xml:lang attributes</code> - (see <a href="#language-attributes">Language attributes</a>); however, - the <a href="http://www.w3.org/TR/css3-selectors/#lang-pseudo">CSS3 Selectors specification</a> stipulates that - language attributes, including <code>xml:lang</code>, are matched in a case insensitive way. [[!SELECT]] + Also note that because XML processors don't recognize <code>lang</code> as containing language information, + <a>polyglot markup</a> uses both the <code>lang</code> and the <code>xml:lang attributes</code> + (see <a href="#language-attributes">Language attributes</a>); however, + the <a href="http://www.w3.org/TR/css3-selectors/#lang-pseudo">CSS3 Selectors specification</a> stipulates that + language attributes, including <code>xml:lang</code>, are matched in a case insensitive way. [[!SELECT]] </p> <!--End section: Attribute values--> </section> <!--End section: Case-Sensitivity--> </section> - <!--End section: Elements --> +<!--End section: Elements --> </section> - <section id="contents-of-elements" class="section"> +<section id="contents-of-elements" class="section"> <h3>Element contents</h3> <p>For the <a href="http://www.w3.org/TR/html5/syntax.html#elements-0">different kinds of elements</a> that HTML documents contain, <a>polyglot markup</a> conforms to the following contents rules.</p> - <section id="empty-elements" class="section"> +<section id="empty-elements" class="section"> <h4>Void elements</h4> <p>In the HTML syntax, void elements are elements that always are empty and never have an end tag. All elements listed as void <a href="http://www.w3.org/TR/html5/syntax.html#void-elements" >in the HTML specification</a> or @@ -696,112 +653,112 @@ <!--End section: void Elements--> </section> - <section id="raw-text-elements"> - <h4>Raw text elements (<code>script</code> and <code>style</code>)</h4> - <p> - In <a>polyglot markup</a>, the contents of all elements listed as raw text elements - <a href="http://www.w3.org/TR/html5/syntax.html#raw-text-elements" >in the HTML specification</a> or - in an extension spec, MUST conform to the extra requirements defined in this section. +<section id="raw-text-elements"> +<h4>Raw text elements (<code>script</code> and <code>style</code>)</h4> +<p> + In <a>polyglot markup</a>, the contents of all elements listed as raw text elements + <a href="http://www.w3.org/TR/html5/syntax.html#raw-text-elements" >in the HTML specification</a> or + in an extension spec, MUST conform to the extra requirements defined in this section. +</p> + +<figure> + <figcaption>HTML5's list of raw text elements</figcaption> + <blockquote cite="http://www.w3.org/TR/html5/syntax.html#raw-text-elements"> + <code>script</code>, <code>style</code> + <!-- iframe and noscript don't count as raw text for syntax purposes --> + </blockquote> +</figure> + +<p> + In HTML syntax, the content of raw text elements is raw text. + In other words, the HTML parser does not treat contained code that looks like tags (element tags and comment tags, + character references, CDATA, etc.) as tags, character references, CDATA, etc., but as raw text. + (See HTML5 for the exact rules.) + In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc. +</p> +<p>As result, it is simpler for authors to comply with the requirement of the default MIME + types of the raw text elements in HTML than it is in XHTML. + On the other hand, with <code class="CDATA">CDATA</code>, the raw text contents + parsed as XHTML can be made even less semantic than the raw text data of HTML, + leading to potential harms if the document is parsed as HTML. +</p> + +<figure id="ambiguous-table"> + <figcaption>Overview over the differences in how HTML and XML parse raw text elements</figcaption> + <table class="simple" border="1" > + + <colgroup><col/><col/><col/><col/><col/></colgroup> + <thead> + <tr> + <th rowspan="2">Ambiguous string</th><th rowspan="2">Info</th><th rowspan="2">HTML interpretation</th><th colspan="2">XML interpretation</th> + </tr> + <tr><th>if inside <code><![CDATA[</code>section<code>]]></code></th><th>if outside <code><![CDATA[</code>section<code>]]></code></th> + </tr> + </thead> + <tbody> + <tr> + <td><code><</code></td> + <td>LESS-THAN SIGN</td><td>uninterpreted <small>(but see the <code></script</code> and <code></style</code> rows)</small></td> + <td>uninterpreted</td><td>interpreted <small>(commences tags, comments, CDATA)</small></td></tr> + <tr><td><code>&</code></td><td>AMPERSAND</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>commences character reference or entity</small></td></tr> + <tr><td><code><!--</code></td><td>start of comment</td><td>partly unintepreted</td><td>uninterpreted</td><td>interpreted</td></tr> + <tr><td><code>--></code></td><td>end of comment</td><td>partly unintepreted</td><td>uninterpreted</td><td>interpreted</td></tr> + <tr><td><code><![CDATA[</code></td><td>start of CDATA declaration</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>(begins CDATA block)</small></td></tr> + <tr><td><code>]]></code></td><td>end of CDATA declaration</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>(ends CDATA block)</small></td></tr> + <tr><td><code>cdata content</code></td><td>the content of CDATA sections</td><td></td><td>uninterpreted</td><td>—</td></tr> + <tr><td><code></script</code> </td><td>if occuring inside <code>script</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> + <tr><td><code></style</code></td><td>if occuring inside <code>style</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> + <tr><td><code><foo></bar></code></td><td>all other tags, well-formed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> + <tr><td><code>&#foo;</code></td><td>character references</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> </tbody> <tbody> + <tr><td><code>none of the above strings</code></td><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr> + </tbody> + </table> +</figure> + + +<p>Syntactically, the polyglot subset is found by</p> +<ul><li><em>either</em> <strong>limiting the content to <dfn>safe text content</dfn></strong>, that + is, text that gets interpreted the same way in HTML and in XML.</li> + <li><em>or</em> trying to <strong>even out the constraints differences</strong> by + wrapping the contents in a <code>CDATA</code> section. The <code>CDATA</code> code is then seen as text + by the HTML parser (and can thus interfere with the scripting or styling language!), while the XML parser sees the + content as text without markup semantics.</li></ul> +<p>Limiting the contents to <a>safe text content</a> requires more planning and control over the code, but can be said to be + more <a href="#dfn-robust-syntax">robust</a> than the <code>CDATA</code> option as it requires no extra, potentially + breakable code to make the scripting or styling language work. The <code>CDATA</code> option on the + other hand, gives more freedom and robustness against various errors that can happen because the author isn’t + aware of the <a>safe text content</a> limitations or because the code is inserted by a tool that is unable to + guarantee that the content is <a title="safe text content">safe</a>.</p> + +<section id="safe-text-content"> + <h5>Options for delivering safe text content</h5> + <p><a title="polyglot markup">Polyglot markup</a> can deliver <a>safe text content</a> both externally and internally. </p> - - <figure> - <figcaption>HTML5's list of raw text elements</figcaption> - <blockquote cite="http://www.w3.org/TR/html5/syntax.html#raw-text-elements"> - <code>script</code>, <code>style</code> - <!-- iframe and noscript don't count as raw text for syntax purposes --> - </blockquote> - </figure> - - <p> - In HTML syntax, the content of raw text elements is raw text. - In other words, the HTML parser does not treat contained code that looks like tags (element tags and comment tags, - character references, CDATA, etc.) as tags, character references, CDATA, etc., but as raw text. - (See HTML5 for the exact rules.) - In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc. - </p> [523 lines skipped]
Received on Thursday, 16 January 2014 16:40:59 UTC