- From: CVS User lsilli <cvsmail@w3.org>
- Date: Mon, 02 Sep 2013 00:04:32 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-xhtml-author-guide In directory roscoe:/tmp/cvs-serv31563/html-xhtml-author-guide Modified Files: html-xhtml-authoring-guide.html Log Message: The final changes related to bug 19925 --- /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html 2013/09/01 20:56:19 1.120 +++ /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html 2013/09/02 00:04:32 1.121 @@ -76,40 +76,93 @@ (such as for the ambigous namespace prefix <code>xml:</code>, which is permitted as prefix for the <code>lang</code> in the XML namespace – <code>xml:lang</code>). --> - -<section id="introduction" class="informative"> -<h2>Introduction</h2> - <section id="value"> - <h3>General</h3> - <p> - It is often valuable to be able to serve HTML5 documents that are also well formed XML documents. - An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. - The language used to create documents that can be parsed by both HTML and XML parsers is called <a title="polyglot markkup">polyglot markup</a>. - <a title="polyglot markup">Polyglot markup</a> is the overlap language of documents that are both HTML5 documents and XML documents. - It is recommended that these documents be served as either <code>text/html</code> (if the content is transmitted to an HTML-aware user agent) - or <code>application/xhtml+xml</code> (if the content is transmitted to an XHTML-aware user agent). - Other permissible MIME types are <code>text/xml</code>, <code>application/xml</code>, - and any MIME type whose subtype ends with the four characters "<code>+xml</code>". [[!XML-MT]] - </p> - </section> +<section id="introduction" class="informative"><h2>Introduction</h2> +<p>It is sometimes valuable to be able to serve HTML5 documents that are also well formed XML documents. +An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. +The language used to create documents that can be parsed by both HTML and XML parsers is called <a title="polyglot markkup">polyglot markup</a>. +<a title="polyglot markup">Polyglot markup</a> is the overlap language of documents that are both HTML5 documents and XML documents. +It is recommended that these documents be served as either <code>text/html</code> (if the content is transmitted to an HTML-aware user agent) +or <code>application/xhtml+xml</code> (if the content is transmitted to an XHTML-aware user agent). +Other permissible MIME types are <code>text/xml</code>, <code>application/xml</code>, +and any MIME type whose subtype ends with the four characters "<code>+xml</code>". [[!XML-MT]]</p> +<!--end general--> <section id="scope"> - <h3>Scope</h3> -<p> All web content need not be authored in <a>polyglot markup</a> and it is primarily an option for authors wanting to increase the robustness of their documents. - <a title="polyglot markup">Polyglot markup</a> works best, and can be a beneficial option, in controlled environments and for authoring tools.</p> - <p> <a title="polyglot markup">Polyglot markup</a> is ideal for publishing when there's a strong desire to serve both HTML and XML tool chains +<h3>Scope</h3> +<p>Polylglot markup is a <em><a title="robustness">robust</a></em> – but entirely <em>optional</em> – profile of the HTML vocabulary. + All web content need not be authored in <a>polyglot markup</a> and it is primarily an option + for authors wanting to increase the <a title="robustness">robustness</a> of their documents. +<a title="polyglot markup">Polyglot markup</a> works best, and can be a beneficial option, in controlled environments and for authoring tools.</p> +<p><a title="polyglot markup">Polyglot markup</a> is ideal for publishing when there's a strong desire to serve both HTML and XML tool chains without simultaneously having to maintain dual copies of the content: one in HTML and a second in XHTML. In addition, a single <a>polyglot markup</a> output requires less infrastructure to produce than to produce both HTML and XHTML output for the same content. <a title="polyglot markup">Polyglot markup</a> is also be beneficial when lightweight processes—such as quick testing or even hand-authoring—are applied to content intended to be published both as HTML and XHTML, especially if that content is not sent through a tool chain.</p> -<p class="note">XML-based HTML tools or systems intended for the most general - contexts of use cannot depend on polyglot input: for maximum flexibility, - such tools should use the technique of using an HTML parser that produces - an XML-compatible DOM or event stream.</p> +<p class="note">XML-based HTML tools or systems intended for the most general contexts of use cannot depend on polyglot input: for maximum flexibility, + such tools should use the technique of using an HTML parser that produces an XML-compatible DOM or event stream.</p> +</section> +<!--end scope--> +<section id="robust"> + <h3>Robustness</h3> + + <p>Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. It is not a goal in itself. However, authors do not need + to understand these benefits in order to use and benefit from this syntax. But neither does anyone + need to exaggerate its benefits. For instance, polyglot markup does not add semantics. Polyglot markup does, + however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup + also doesn’t (at least not for the time being) ensure accessibility - as it does not any requirements + that other relevant specs do not add. But it can work to <em>preserve</em> accessibility.</p> + + <p>The motivation behind, and reason for polyglot markup to exist as a specification, is its widely supported + <a title="robustness">robustness</a>. With <a title="robustness">robust</a> (also known as conservative) markup, authors can <q cite="http://www.w3.org/TR/WCAG20/#robust"> + maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p> + + <p>Polyglot markup seeks to define constraints on the serialization of a DOM tree in a <a title="robustness">robust</a> manner that + is likely to retain semantics when said serialization is reparsed using a variety of parsers, be + they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers. + </p> + <p> For the most part, polyglot markup is just a pure deduction of the validity constraints and syntax requirements that + HTML and XHTML dictate, many of which took polyglotness into considertaion when they were added to HTML5. + However, for reasons of <a title="robustness">robustness</a>, the spec sometimes goes a little further than the principle of the lowest common + would have required.</p> + + <p> For instance, included in the set of constrains on the serialization is the requirement to use the UTF-8 encoding. + This requirement is not only because of the + documented benefits (the HTML-specific ones are described in HTML5 [[!HTML5]]) of this encoding - which in turn has lead the HTML5 specification to recommend + that all new documents use UTF-8, but also because it is the sole encoding that <em>every</em> parser, be it a HTML parser or + and XML parser, is required to support. Also, UTF-8 can also be the sole <em>HTML-valid</em> option, since it is one of + only two encodings (the other being UTF-16, with its own, separate set of well-known issues) for which XML well-formed + rules doesn’t require the encoding to be explicitly declared. This in turn has the benefit that the anyhow HTML-invalid XML + encoding declaration kan reliably be skipped without causing any side-effects. E.g. if one chose to use the <code>KOI8-r</code>, + encoding, then, as a side-effect, HTML-validity and XML well-formedness, the author would have to rely on a higher protocol + (such as MIME <code>Content-Type</code>) in order to support XML parsers. By requiring + UTF-8, this side-effect is avoided. And so, while not the only theoretical possibility, the choice of + UTF-8 as the sole option, can be justified underlying principle of <a title="robustness">robustness</a>.</p> + + <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers. + But even if the document can be expected to be parsed and validated by fully HTML5 conforming tools, + polyglot markup adds <a title="robustness">robustness</a>. As an example, when serialized as HTML, close tags for + paragraph elements are entirely optional and will be inferred if not present. Inclusion of + close tags, as required by XML and, thus, by polyglot markup, cause no harm beyond a minor increase + in transfer size (an increase often mitigated by compression), but does + allow validators to detect situations where the implicit closing rules + don't match what the author intended. + </p> + <p class="note"> + Polyglot markup is not defined as ”robust markup” because the XML-based polyglot markup + syntax is not the only way to increase <a title="robustness">robustness</a>. + For instance, an HTML validator or an text editor could require all tags to be closed even if + this is not required by the HTML syntax. But then again, polyglot markup, being valid + XML, has some sometimes practical benefits which such a setup alone does not have. + </p> </section> +<!--end robust--> </section> +<!-- end intro--> + + + <section id="syntax"> <h2>The syntax of polyglot markup</h2> <section id="principles"><h3>Principles</h3> @@ -852,7 +905,8 @@ </table> -<p>Outside CDATA declarations, the content of <code>script</code> and <code>style</code> MUST NOT use ambigious strings, as anything else results in unequal DOMs for XML or HTML or risks that the author gets stuck in hard to trace differences between XML and HTMl. This is often also the most robust and simplest coding method, and also promotes the use of external styles and scripts, which is considered a best practise. However, as some scripts and stylesheets (such as JavaScript) make use of <code><</code>, <code>&</code> in their syntax or, often, contain strings of markup, authors MAY also declare CDATA sections inside <code>script</code> and <code>style</code>.</p> +<p>Outside CDATA declarations, the content of <code>script</code> and <code>style</code> MUST NOT use ambigious strings, as anything else results in unequal DOMs for XML or HTML or risks that the author gets stuck in hard to trace differences between XML and HTMl. This is often also the most + <a title="robustness">robust</a> and simplest coding method, and also promotes the use of external styles and scripts, which is considered a best practise. However, as some scripts and stylesheets (such as JavaScript) make use of <code><</code>, <code>&</code> in their syntax or, often, contain strings of markup, authors MAY also declare CDATA sections inside <code>script</code> and <code>style</code>.</p> <p>But note that while the CDATA ’tags’ will be ignored by scripts and stylesheets that operate in XML mode, the very declartion will be visible to in HTML mode, which in turn might cause the script to not work until the declaration is escaped.</p> <p>The use of CDATA sections MUST adhere to the following rules:</p> <ul>
Received on Monday, 2 September 2013 00:04:33 UTC