- From: CVS User egraff <cvsmail@w3.org>
- Date: Mon, 13 Jan 2014 00:10:17 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-polyglot In directory roscoe:/tmp/cvs-serv20611 Modified Files: WD-html-polyglot-20140117.html Log Message: Updated content based on ED of 1/12 --- /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html 2014/01/09 00:22:10 1.2 +++ /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html 2014/01/13 00:10:17 1.3 @@ -37,8 +37,8 @@ <body> <section id="abstract"> A document that uses <a title="polyglot markup">polyglot markup</a> is a document that is a stream of bytes that parses into identical document trees - (with some exceptions, as noted in the <a href="#introduction">Introduction</a>) when processed as HTML and when processed as XML. - Polyglot markup that meets a well-defined set of constraints is interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification. + (with some exceptions, as noted in the <a href="#introduction">Introduction</a>) when processed either as HTML or when processed as XML. + Polyglot markup that meets a well-defined set of constraints is interpreted as compatible, regardless of whether it is processed as HTML or as XHTML, per the HTML5 specification. Polyglot markup uses a specific DOCTYPE, namespace declarations, and a specific case—normally lower case but occasionally camel case—for element and attribute names. Polyglot markup uses lower case for certain attribute values. Further constraints include those on void elements, named entity references, and the use of scripts and style. @@ -46,10 +46,11 @@ </section> <section id="sotd"> <p> - This document summarizes design guidelines for authors who wish their XHTML or HTML documents to validate on both HTML and XML parsers. - This specification is intended to be used by web authors, particularly authors who want to serve receivers which may have either (but not both) XML or HTML parsers available. - This commonly arises in legacy systems and content syndication. - Polyglot is one of several transition mechanisms from legacy XML to HTML5 and this document serves to describe it accurately. + This specification summarizes design guidelines for authors who wish their XHTML or HTML documents to be conforming whether parsed as HTML or as XML. + The document is intended to be useful to web authors, in particular those who want to serve receivers without concern for whether they have XML or HTML parsers available. + Such concerns may, for instance, arise in content syndication or when receivers are on legacy systems. + HTML polyglots <a href="http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#charset-0">facilitate migration to and from XHTML</a>, + including transition from legacy XML to HTML5, and this document serves to accurately specify the requirements of a UTF-8 based profile for such documents. </p> <p> No recommendation is made in this document or by the W3C regarding whether or not to publish polyglot content. @@ -107,46 +108,63 @@ <section id="robust"> <h3>Robustness</h3> - <p>Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. It is not a goal in itself. However, authors do not need - to understand these benefits in order to use and benefit from this syntax. But neither does anyone - need to exaggerate its benefits. For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. Polyglot markup does, - however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup - also doesn’t ensure accessibility - as it does not add any requirements - that other relevant specs have not allready added. But it can work to <em>preserve</em> accessibility.</p> + <p> Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. + <a title="polyglot markup">Polyglot markup</a> embraces the principle of <a>robustness</a> + as it is defined in Web Content Accessibility Guidelines (WCAG) 2.0: + <em>Content must be robust enough that it can be interpreted reliably by a wide variety of user agents, + including assistive technologies.</em> [[!WCAG20]] + </p> + <p><a title="robustness">Robustness</a> is not a goal in itself, nor do authors need + to understand the benefits of <a>robustness</a> in order to use and benefit from the syntax of polyglot markup. + Nor does anyone need to exaggerate the benefits of polyglot markup. + For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. + Polyglot markup does, however, work to <em>preserve</em> semantics, including during the authoring process. + Polyglot markup does not ensure accessibility,as it does not add any accessibility requirements + that other relevant specifications have not already added. + But <a>polyglot markup</a> can work to <em>preserve</em> accessibility through adherence to required practices.</p> - <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> to exist as a specification, is its widely supported - <a title="robustness">robustness</a>. With <a title="robustness">robust</a> (also known as conservative) markup, authors can + <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> is support for <a title="robustness">robustness</a>. + With <a title="robustness">robust</a> (sometimes known as conservative) markup, authors can <q cite="http://www.w3.org/TR/WCAG20/#robust">maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p> - <p>Polyglot markup seeks to define constraints on the serialization of a DOM tree in a <a title="robustness">robust</a> manner that - is likely to retain semantics when said serialization is reparsed using a variety of parsers, be + <p>Polyglot markup approaches <a>robustness</a> by defining constraints on the serialization of a DOM tree in a manner that + is likely to retain semantics when that serialization is reparsed using a variety of parsers, be they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers. </p> <p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that - HTML and XHTML dictate, many of which took polyglotness into consideration when they were added to HTML5. - However, for reasons of <a title="robustness">robustness</a>, the spec sometimes goes a little further than the principle of the lowest common + HTML and XHTML dictate, many of which took "polyglotness" into consideration when they were added to HTML5. + However, for reasons of <a title="robustness">robustness</a>, this specification sometimes goes a little further than the principle of the lowest common denominator would have required.</p> <p> For instance, included in the set of constraints on the serialization is the requirement to use the UTF-8 encoding. - This requirement is not only because of the documented benefits (the HTML-specific benefits are described in HTML5 [[!HTML5]]) – - which in turn has lead the HTML5 specification to recommend - that all new documents use UTF-8, but also because it is the sole encoding that <em>every</em> parser, be it an HTML parser or - an XML parser, is required to support. Also, UTF-8 might in some situations be the sole <em>HTML-conforming</em> option, since it is one of + This requirement is not only because of the documented benefits + that in turn have lead the HTML5 specification to recommend that all new documents use UTF-8, + but also because it is the sole encoding that <em>every</em> parser, + be it an HTML parser or an XML parser, is required to support. + Note that the HTML-specific benefits are described in HTML5 [[!HTML5]]. + </p> + <p> Also, UTF-8 might in some situations be the sole <em>HTML-conforming</em> option, since it is one of only two encodings (the other being UTF-16, with its own, separate set of well-known issues) for which XML well-formed - rules doesn’t require the encoding to be explicitly declared. This in turn has the benefit that the anyhow HTML-invalid XML - encoding declaration kan reliably be skipped without causing any side-effects. E.g. if one opted to use the <code>KOI8-R</code>, - encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, the author would have - been forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) in order to support XML parsers. By requiring - UTF-8, this side-effect is avoided. And so, while not the only theoretical possibility, the choice of - UTF-8 as the sole option, is justified by the underlying principle of <a title="robustness">robustness</a>.</p> + rules doesn’t require the encoding to be explicitly declared. + This in turn has the benefit that any HTML-invalid XML + encoding declaration can reliably be skipped without causing any side-effects. + For example, if someone opted to use the <code>KOI8-R</code>, + encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, + the author would be forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) + in order to support XML parsers. + By requiring UTF-8, this side-effect is avoided. + And so, while not the only theoretical possibility, + the choice of UTF-8 as the sole option is justified by the underlying principle of <a title="robustness">robustness</a>. + </p> <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers. - But even if the document can be expected to be parsed and validated by fully HTML5 conforming tools, - <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>. As an example, when serialized as HTML, the closing tag for + But even if the document can be expected to be parsed and validated by tools that fully conform to HTML5, + <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>. + As an example, when serialized as HTML, the closing tag for the <code>p</code> element is entirely optional and will be inferred if not present. But inclusion of - closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, cause no harm beyond a minor increase - in transfer size (an increase often mitigated by compression), but does + closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, + cause no harm beyond a minor increase in transfer size (an increase often mitigated by compression), but does allow validators to detect situations where the implicit closing rules don't match what the author intended. </p> @@ -206,7 +224,7 @@ <section id="PI-and-xml" class="section"> <h3>Processing instructions and the XML declaration</h3> <p> - Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>. + Processing instructions and the XML declaration are both forbidden in <a>polyglot markup</a>. </p> <!--End section: Processing Instructions and the XML Declaration--> </section> @@ -214,13 +232,14 @@ <h3>Specifying a document’s character encoding</h3> <p> <a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. - HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]]. - For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. - As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]]. + HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a>. [[!HTML5]] + </p> + <p> For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. + As such, character encoding MAY be left undeclared in XML with the result that UTF8 is still supported [[!XML10]]. </p> <p> <a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or - in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>): + in combination (but note that there can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>): </p> <ul> <li>Within the document @@ -304,7 +323,7 @@ <p> [[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>, and the root MathML element, <code>math</code>. - <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML-compatibility [[!XML10]]:</p> + <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML compatibility [[!XML10]]:</p> <ul class="inline-list"> <li><code><html xmlns="http://www.w3.org/1999/xhtml"></code></li> <li><code><math xmlns="http://www.w3.org/1998/Math/MathML"></code></li> @@ -342,13 +361,13 @@ </ul> <p> Note that there are other prefixed attributes that can be used beyond <code>xlink:href</code> (such as <code>xml:base</code>). - <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via xmlns. The prefixes are implicitly declared + <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via <code>xmlns</code>. The prefixes are implicitly declared in XML and are automatically applied to the appropriate attributes in HTML. </p> <p> The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML. - Thus, the rules for how they can be sued as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] - For more on the issues related to attribute selectors and namespaces, with and without prefix, see the section on <a + Thus, the rules for how they can be used as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] + For more about the issues related to attribute selectors and namespaces, with and without prefixes, see the section on <a href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>. <p> @@ -362,19 +381,21 @@ <section id="required-elements" class="section"> <h6>Required elements and tags</h6> - <p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a - href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the - HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for - them. However, since XML does not have a feature whereby elements with one or both tags that have been - omitted from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM, - omitting a tag in <a>polyglot markup</a> is equivalent of producing a not well-formed document or, - if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not - operate with <a>optional tags</a>.</p> - - <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used - to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the - <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot - markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p> + <p> <a title="polyglot markup">Polyglot markup</a> does not employ <a>optional tags</a>. + HTML5’s concept of <dfn>optional tags</dfn> – missing start tags and/or end tags – covers + <a href="http://www.w3.org/TR/html5/syntax.html#optional-tags"> + elements that the HTML parser itself automatically adds to the DOM</a> + if the code doesn’t contain the tags for them. + Because XML does not have such a feature that adds missing start and/or end tags to the DOM, + omitting a tag in <a>polyglot markup</a> is equivalent to producing a document that is not well-formed or, + if both tags are omitted, equivalent to not adding the element at all. </p> + + <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises for an author not used + to adding the <code>tbody</code> tags in their code, for example, + or to someone accustomed to omitting the end tag of the <code>p</code> element. + However, the requirement to be well-formed with regard to tags is a key feature of <a>polyglot markup</a> + that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises. + </p> <section id="minimal-polyglot-html-document"> <h4>A minimal HTML document</h4> <p> @@ -633,7 +654,7 @@ <a>polyglot markup</a> uses both the <code>lang</code> and the <code>xml:lang attributes</code> (see <a href="#language-attributes">Language attributes</a>); however, the <a href="http://www.w3.org/TR/css3-selectors/#lang-pseudo">CSS3 Selectors specification</a> stipulates that - language attributes, including <code>xml:lang</code>, are matched in a case-insensitive way. [[!SELECT]] + language attributes, including <code>xml:lang</code>, are matched in a case insensitive way. [[!SELECT]] </p> <!--End section: Attribute values--> </section> @@ -692,15 +713,17 @@ </figure> <p> - In the HTML syntax, the contents of raw text elements is raw text, by which it is referred to the fact - that the HTML parser will not treat contained code that look like tags (element tags and comment tags), character references, - CDATA etc as tags, character references, CDATA etc, but as raw text. (See HTML5 for the exact rules.) + In HTML syntax, the content of raw text elements is raw text. + In other words, the HTML parser does not treat contained code that looks like tags (element tags and comment tags, + character references, CDATA, etc.) as tags, character references, CDATA, etc., but as raw text. + (See HTML5 for the exact rules.) In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc. </p> - <p>As result, in HTML, it is simpler than it is in XHTML, for authors to comply with the requirement of the default MIME - types of the raw text elements. On the other side, by the use of <code class="CDATA">CDATA</code>, the raw text contents - parsed as XHTML, can be made ven less semantic than the raw text data of HTML, leading to potential harms if the document - is parsed as HTML + <p>As result, it is simpler for authors to comply with the requirement of the default MIME + types of the raw text elements in HTML than it is in XHTML. + On the other hand, with <code class="CDATA">CDATA</code>, the raw text contents + parsed as XHTML can be made even less semantic than the raw text data of HTML, + leading to potential harms if the document is parsed as HTML. </p> <figure id="ambiguous-table"> @@ -712,7 +735,7 @@ <tr> <th rowspan="2">Ambiguous string</th><th rowspan="2">Info</th><th rowspan="2">HTML interpretation</th><th colspan="2">XML interpretation</th> </tr> - <tr><th>if inside <code><[CDATA[</code>section<code>]]></code></th><th>if outside <code><[CDATA[</code>section<code>]]></code></th> + <tr><th>if inside <code><![CDATA[</code>section<code>]]></code></th><th>if outside <code><![CDATA[</code>section<code>]]></code></th> </tr> </thead> <tbody> @@ -723,56 +746,62 @@ <tr><td><code>&</code></td><td>AMPERSAND</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>commences character reference or entity</small></td></tr> <tr><td><code><!--</code></td><td>start of comment</td><td>partly unintepreted</td><td>uninterpreted</td><td>interpreted</td></tr> <tr><td><code>--></code></td><td>end of comment</td><td>partly unintepreted</td><td>uninterpreted</td><td>interpreted</td></tr> - <tr><td><code><[CDATA[</code></td><td>start of CDATA declaration</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>(begins CDATA block)</small></td></tr> + <tr><td><code><![CDATA[</code></td><td>start of CDATA declaration</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>(begins CDATA block)</small></td></tr> <tr><td><code>]]></code></td><td>end of CDATA declaration</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>(ends CDATA block)</small></td></tr> <tr><td><code>cdata content</code></td><td>the content of CDATA sections</td><td></td><td>uninterpreted</td><td>—</td></tr> <tr><td><code></script</code> </td><td>if occuring inside <code>script</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> <tr><td><code></style</code></td><td>if occuring inside <code>style</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr> - <tr><td><code><foo></bar></code></td><td>all other tags, wellformed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> + <tr><td><code><foo></bar></code></td><td>all other tags, well-formed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> <tr><td><code>&#foo;</code></td><td>character references</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> </tbody> <tbody> - <tr><th><code>none of the above strings</code></th><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr> + <tr><td><code>none of the above strings</code></td><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr> </tbody> </table> </figure> <p>Syntactically, the polyglot subset is found by</p> - <ul><li><em>either</em> <strong>limiting the content to <dfn>safe content</dfn></strong>, that - is: text that gets interpreted the same way in HTML and in XML.</li> + <ul><li><em>either</em> <strong>limiting the content to <dfn>safe text content</dfn></strong>, that + is, text that gets interpreted the same way in HTML and in XML.</li> <li><em>or</em> trying to <strong>even out the constraints differences</strong> by wrapping the contents in a <code>CDATA</code> section. The <code>CDATA</code> code is then seen as text by the HTML parser (and can thus interfere with the scripting or styling language!), while the XML parser sees the content as text without markup semantics.</li></ul> - <p>Limiting the contents to <a>safe content</a> requires more planning and control over the code, but can be said to be + <p>Limiting the contents to <a>safe text content</a> requires more planning and control over the code, but can be said to be more <a title="robustness">robust</a> than the <code>CDATA</code> option as it requires no extra, potentially breakable code to make the scripting or styling language work. The <code>CDATA</code> option on the other hand, gives more freedom and robustness against various errors that can happen because the author isn’t - aware of the <a>safe content</a> limitations or because the code is inserted by a tool that is unable to - guarantee that the content is <a title="safe content">safe</a>.</p> + aware of the <a>safe text content</a> limitations or because the code is inserted by a tool that is unable to + guarantee that the content is <a title="safe text content">safe</a>.</p> <section id="safe-text-content"> - <h5>The safe text content option</h5> - <p>The <dfn>safe text content</dfn> option comes in two variants:</p> + <h5>Options for delivering safe text content</h5> + <p><a title="polyglot markup">Polyglot markup</a> can deliver <a>safe text content</a> both externally and internally. + </p> <ul> - <li>The <strong>external <a>safe text content</a></strong> variant. This implies to include the scripts or stylesheet by linking to an - external file rather than including all the code - in-line. External files are parsed as the respective script or stylesheet, and are thus not limited - by the safe text content restrictions. + <li><strong>External <a>safe text content</a>.</strong> + <a title="polyglot markup">Polyglot markup</a> can include scripts or stylesheets + by linking to external files rather than including the code in-line. + External files are parsed as the respective script or stylesheet and are thus not limited + by the same restrictions as safe text content. <figure> - <figcaption>Using external <a title="safe content">safe content</a>.</figcaption> + <figcaption>Examples of linking to external scripts or stylesheets</figcaption> <pre class="example highlight" ><!-- Ways to link to external scripts or stylesheets --><br/><script src="external.js" ></script><br /><link href="external.css" rel="stylesheet"/><br /><style>@import "external.css";</style></pre> </figure> </li> - <li>The <strong>inline <a>safe text content</a></strong> variant. This option implies abstaining from using characters and constructs - which HTML and XML interpret differently, namely the characters <code><</code> and <code>&</code> - as well as the <code>CDATA</code> end mark string – <code>]]></code>. <a title="polyglot markup">Polyglot markup</a> - is agnostic as to whether one uses a character entity or a numeric character reference, as long as it is valid. - For <a>polyglot markup</a>, there is no difference between <code>&amp;</code> and <code>&#x3C;</code>. + <li><strong>Inline <a>safe text content</a>.</strong> + <a title="polyglot markup">Polyglot markup</a> does not use characters or constructs + that are interpreted differently in HTML and XML. + This means not using the characters <code><</code> and <code>&</code> + as well as the <code>CDATA</code> end mark string – <code>]]></code>. + <a title="polyglot markup">Polyglot markup</a> + is agnostic as to whether one uses character entities or a numeric character references, + so long as they are valid. + That is, for <a>polyglot markup</a>, there is no difference between <code>&amp;</code> and <code>&#x3C;</code>. <figure> - <figcaption>Using inline <a title="safe content">safe text content</a></figcaption> + <figcaption>Examples of content that is not safe text content</figcaption> <pre class="example highlight"><!-- Unsafe content: < and & are not escaped<br /> This code is not XML well-formed. --><br/><style>q::before{content:"<";}</style><br/><script>var a = "&";</script> @@ -784,13 +813,13 @@ <p>For CSS, the inline <a>safe text content</a> option would work very well most of the time, as <code><</code> and <code>&</code> are not key parts of CSS and not very often used. But when it comes to JavaScript, the <code>&</code> and the <code><</code> are key verbs (operators) of the - language, and thus one soon runs into trouble – it is better to use <em>external</em> <a>safe content</a>.</p> + language, and thus one soon runs into trouble – it is better to use <em>external</em> <a>safe text content</a>.</p> </li> </ul> <figure> - <figcaption>An example of inline safe text content in <code>script</code></figcaption> + <figcaption>Inline content containing no ambiguous strings</figcaption> <pre class="example highlight" - ><!-- The following the example is <a>polyglot markup</a> because there are no ambiguous strings within the <code>script</code> element. --><br + ><!-- The following example of inline script is <a>polyglot markup</a> because there are no ambiguous strings within the <code>script</code> element. --><br /><script>document.body.appendChild(document.createElement("div"));</script></pre> </figure> @@ -800,32 +829,35 @@ </p> </section> <section id="safe-CDATA-content"> - <h5>The safe CDATA content option</h5> - <p>The safe CDATA option wraps the raw text content in <code>CDATA</code> section <strong>but(!)</strong> instead - of permitting <em>any</em> content (except the very CDATA end mark string – <code>]]></code>), only the + <h5>Safe CDATA content</h5> + <p><a title="polyglot markup">Polyglot markup</a> accepts raw text content wrapped in a <code>CDATA</code> section; + <strong>however</strong> instead of permitting <em>any</em> content (except the very CDATA end mark string – <code>]]></code>), only the subset that corresponds to the particular raw text element’s HTML constraints is permitted. See the “HTML interpretation” column in the <a href="#ambiguous-table">parsing differences table above</a> – all the cells with the text ”uninterpreted” are also uninterpreted as CDATA and thus constitutes the safe subset of CDATA.</p> - <p>But while CDATA evens out the constraints, it introduces a new problem: When consumed as HTML, the start and end mark of the + <p>Wrapping raw text in a CDATA section introduces a new problem: when consumed as HTML, the start and end mark of the CDATA section is seen by the script or stylesheet interpreter and can thus cause syntax errors or even halt the script - and stylesheet execution. The way to deal with it is to comment out the CDATA start and end mark - using the comment methods of the script or stylesheet language. Additionally, if e.g. <code>script</code> is used as a - coding block container, it may be necessary to even comment out the scripting/styling comments by hiding them - inside a XML comment.</p> + and stylesheet execution. + A solution is to comment out the CDATA start and end marks by using the comment methods of the script or stylesheet language. + Additionally, such as when <code>script</code> is used as a coding block container, + it may be necessary to even comment out the scripting/styling comments by hiding them inside an XML comment.</p> <section id="CDATA-rules-raw-text"> - <h6>Safe CDATA usage rules</h6> - <p>These rules assumes that CDATA is of limited use for CSS.</p> + <h6>Safe rules for CDATA use</h6> + <p>These rules assume that CDATA is of limited use for CSS.</p> <p>General rules:</p> <ul> - <li> The CDATA section is subject to HTML’s restrictions on <code><script></code>/<code><style></code></li> - <li> Only one CDATA section permitted per raw text element</li> + <li> The CDATA section is subject to HTML’s restrictions on <code><script></code> and <code><style></code>.</li> + <li> There can be only one CDATA section per raw text element.</li> <li> Before the CDATA section there can only be one node - preferrably only one line of code, which may - consist of whitespace, or an XML comment or a construct of the scripting/styling language (usually + consist of whitespace, or an XML comment, or a construct of the scripting/styling language (usually a comment of the scripting/styling language).</li> - <li> After the CDATA section: Same rules as for before the CDATA section.</li> - </ul><p>The <code>]]></code> string:</p> + <li> After the CDATA section there can only be one node - preferrably only one line of code, which may + consist of whitespace, or an XML comment, or a construct of the scripting/styling language (usually + a comment of the scripting/styling language).</li> + </ul> + <p>The <code>]]></code> string:</p> [127 lines skipped]
Received on Monday, 13 January 2014 00:10:19 UTC