CVS html5/html-polyglot

Update of /sources/public/html5/html-polyglot
In directory roscoe:/tmp/cvs-serv25997

Added Files:
	WD-html-polyglot-20140117.html 
Log Message:
Initial version of LCWD


--- /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html	2014/01/07 21:23:28	NONE
+++ /sources/public/html5/html-polyglot/WD-html-polyglot-20140117.html	2014/01/07 21:23:28	1.1
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US" >
<head>
    <title>Polyglot Markup: A robust profile of the HTML5 vocabulary</title>
    <meta charset="utf-8" />
    <script class="remove" src="http://www.w3.org/Tools/respec/respec-w3c-common" async=""></script>
    <script class="remove">
        var respecConfig = {
            specStatus:   "LC",
            shortName:    "html-polyglot",
            publishDate:  "2014-01-17",
            previousPublishDate:  "2013-10-22",
            // previousDiffURI:  "http://www.w3.org/TR/2013/WD-html-polyglot-20131022//",
previousMaturity:  "WD",
            edDraftURI:           "http://dev.w3.org/html5/html-polyglot/html-polyglot.html",
            // lcEnd: "2009-08-05",
            editors:  [
                { name: "Eliot Graff", company: "Microsoft Corporation" },
                { name: "Leif H. Silli", company: "<small>&amp;</small>ᴍᴇᴛᴏᴅɪᴜꜱ ᴅᴀ"}
            ],
            wg:           "HTML working group",
            wgURI:        "http://www.w3.org/html/wg/",
            wgPublicList: "public-html",
            wgPatentURI:  "http://www.w3.org/2004/01/pp-impl/40318/status"
        };
    </script>
    <style>table.simple tr>*:first-child{text-align:right;}
    table.simple th code{color:yellow;font-weight:bold;font-size:larger;}
    table.simple [colspan="2"]{text-align:center;}
    table.simple [colspan="3"]{text-align:center;}
    ul.inline-list {white-space:normal}
    ul.inline-list li {display:inline;}
    ul.inline-list li:after {content:",";}
    ul.inline-list li:last-child:after {content:"";}
    </style>
</head>
<body>
<section id="abstract">
    A document that uses <a title="polyglot markup">polyglot markup</a> is a document that is a stream of bytes that parses into identical document trees
    (with some exceptions, as noted in the <a href="#introduction">Introduction</a>) when processed as HTML and when processed as XML.
    Polyglot markup that meets a well-defined set of constraints is interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification.
    Polyglot markup uses a specific DOCTYPE, namespace declarations, and a specific case—normally lower case but occasionally camel case—for element and attribute names.
    Polyglot markup uses lower case for certain attribute values.
    Further constraints include those on void elements, named entity references, and the use of scripts and style.
    <!--End section: Abstract-->
</section>
<section id="sotd">
    <p>
        This document summarizes design guidelines for authors who wish their XHTML or HTML documents to validate on both HTML and XML parsers.
        This specification is intended to be used by web authors, particularly authors who want to serve receivers which may have either (but not both) XML or HTML parsers available.
        This commonly arises in legacy systems and content syndication.
        Polyglot is one of several transition mechanisms from legacy XML to HTML5 and this document serves to describe it accurately.
    </p>
    <p>
        No recommendation is made in this document or by the W3C regarding whether or not to publish polyglot content.
        In general, authors are encouraged to publish HTML content using HTML5 syntax and media types
        (either HTML syntax and <code>text/html</code>, or XHTML syntax and <code>application/xhtml+xml</code>).
    </p>
    <p>
        This document is not a specification for user agents and creates no obligations on user agents.
        Note that this recommendation does not define how HTML5-conforming user agents should process HTML documents.
        Nor does it define the meaning of the Internet Media Type <code>text/html</code>.
        For user agent guidance and for these definitions, see [[!HTML5]] and [[!RFC2854]].
    </p>
    <p>
        Please submit bugs for this document by using the W3C's public bug database (<a href="http://www.w3.org/Bugs/Public/">
        http://www.w3.org/Bugs/Public/</a>) with the product set to <kbd>HTML WG</kbd> and the component set to
        <kbd>HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff)</kbd>.
        If you cannot access the bug database, submit comments by email to the mailing list noted below.
    </p>
    <!--End section: Status of This Document-->
</section>
    <section id="conformance"></section>
<!-- note: for principle section
		In <a>polyglot markup</a>, the strings that XML and HTML interpret differently are considered <dfn>ambiguous
        strings</dfn> and MUST NOT be used except when they are explicitly permitted
(such as for the ambigous namespace prefix <code>xml:</code>, which is permitted as prefix for the <code>lang</code> in the XML namespace – <code>xml:lang</code>).
-->

<section id="introduction" class="informative"><h2>Introduction</h2>
    <p>It is sometimes valuable to be able to serve HTML5 documents that are also well formed XML documents.
        An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools.
        The language used to create documents that can be parsed by both HTML and XML parsers is called <a title="polyglot markup">polyglot markup</a>.
        <a title="polyglot markup">Polyglot markup</a> is the overlap language of documents that are both HTML5 documents and XML documents.
        It is recommended that these documents be served as either <code>text/html</code> (if the content is transmitted to an HTML-aware user agent)
        or <code>application/xhtml+xml</code> (if the content is transmitted to an XHTML-aware user agent).
        Other permissible MIME types are <code>text/xml</code>, <code>application/xml</code>,
        and any MIME type whose subtype ends with the four characters "<code>+xml</code>". [[!XML-MT]]</p>
    <!--end general-->
    <section id="scope">
        <h3>Scope</h3>
        <p>Polylglot markup is a <em><a title="robustness">robust</a></em> – but entirely <em>optional</em> – profile of the HTML vocabulary.
            All web content need not be authored in <a>polyglot markup</a> and it is primarily an option
            for authors wanting to increase the <a title="robustness">robustness</a> of their  documents.
            <a title="polyglot markup">Polyglot markup</a> works best, and can be a beneficial option, in controlled environments and for authoring tools.</p>
        <p><a title="polyglot markup">Polyglot markup</a> is ideal for publishing when there's a strong desire to serve both HTML and XML tool chains
            without simultaneously having to maintain dual copies of the content: one in HTML and a second in XHTML.
            In addition, a single <a>polyglot markup</a> output requires less infrastructure to produce than to produce both HTML and XHTML output for the same content.
            <a title="polyglot markup">Polyglot markup</a> is also be beneficial when lightweight processes&#x2014;such as
            quick testing or even hand-authoring&#x2014;are applied to content intended to be published both as HTML and XHTML,
            especially if that content is not sent through a tool chain.</p>

        <p class="note">XML-based HTML tools or systems intended for the most general contexts of use cannot <strong><em>depend</em></strong> on polyglot input: for maximum flexibility,
            such tools should use the technique of using an HTML parser that produces an XML-compatible DOM or event stream.</p>
    </section>
    <!--end scope-->
    <section id="robust">
        <h3>Robustness</h3>

        <p>Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. It is not a goal in itself. However, authors do not need
            to understand these benefits in order to use and benefit from this syntax. But neither does anyone
            need to exaggerate its benefits. For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. Polyglot markup does,
            however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup
            also doesn’t ensure accessibility - as it does not add any requirements
            that other relevant specs have not allready added. But it can work to <em>preserve</em> accessibility.</p>

        <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> to exist as a specification, is its widely supported
            <a title="robustness">robustness</a>. With <a title="robustness">robust</a> (also known as conservative) markup, authors can 
            <q cite="http://www.w3.org/TR/WCAG20/#robust">maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p>

        <p>Polyglot markup seeks to define constraints on the serialization of a DOM tree in a <a title="robustness">robust</a> manner that
            is likely to retain semantics when said serialization is reparsed using a variety of parsers, be
            they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers.
        </p>

        <p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that
            HTML and XHTML dictate, many of which took polyglotness into consideration when they were added to HTML5.
            However, for reasons of <a title="robustness">robustness</a>, the spec sometimes goes a little further than the principle of the lowest common
            denominator would have required.</p>

        <p> For instance, included in the set of constraints on the serialization is the requirement to use the UTF-8 encoding.
            This requirement is not only because of the documented benefits (the HTML-specific benefits are described in HTML5 [[!HTML5]]) –
            which in turn has lead the HTML5 specification to recommend
            that all new documents use UTF-8, but also because it is the sole encoding that <em>every</em> parser, be it an HTML parser or
            an XML parser, is required to support. Also,  UTF-8 might in some situations be the sole <em>HTML-conforming</em> option, since it is one of
            only two encodings (the other being UTF-16, with its own, separate set of well-known issues) for which XML well-formed
            rules doesn’t require the encoding to be explicitly declared. This in turn has the benefit that the anyhow HTML-invalid XML
            encoding declaration kan reliably be skipped without causing any side-effects. E.g. if one opted to use the <code>KOI8-R</code>,
            encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, the author would have
            been forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) in order to support XML parsers. By requiring
            UTF-8, this side-effect is avoided. And so, while not the only theoretical possibility, the choice of
            UTF-8 as the sole option, is justified by the underlying principle of <a title="robustness">robustness</a>.</p>

        <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers.
            But even if the document can be expected to be parsed and validated by fully HTML5 conforming tools,
            <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>.  As an example, when serialized as HTML, the closing tag for
            the <code>p</code> element is entirely optional and will be inferred if not present.  But inclusion of
            closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, cause no harm beyond a minor increase
            in transfer size (an increase often mitigated by compression), but does
            allow validators to detect situations where the implicit closing rules
            don't match what the author intended.
        </p>
        <p class="note">
            Polyglot markup is not defined as "robust markup" because the XML-based polyglot markup
            syntax is not the only way to increase <a title="robustness">robustness</a>.
            For instance, an HTML validator or an authoring tool could require all tags to be closed even if
            this is not required by the HTML syntax.  But then again, <a title="polyglot markup">polyglot markup</a>, being valid
            XML, has some sometimes practical benefits which such a custom setup alone would not have.
        </p>
    </section>
    <!--end robust-->
</section>
<!-- end intro-->

<section id="syntax">
    <h2>Syntax</h2>
    <section id="principles"><h3>Principles</h3>
        <p>
            <dfn>Polyglot markup</dfn> results in:
        </p>
        <ul>
            <li>a valid HTML document. [[!HTML5]]</li>
            <li>a <a href="http://www.w3.org/TR/2008/PER-xml-20080205/#sec-well-formed">well-formed XML</a> document. [[!XML10]]</li>
            <li>identical DOMs when processed as HTML and when processed as XML, with some notable exceptions: HTML and XML parsers generate different DOMs for some
                <code>xml</code> (<code>xml:lang</code>, <code>xml:space</code>, and <code>xml:base</code>),
                <code>xmlns</code> (<code>xmlns=""</code> and <code>xmlns:xlink=""</code>), and <code>xlink</code> (such as <code>xlink:href</code>) attributes.
                XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers. The exception must not break the requirement to be a valid HTML document.
            </li>
        </ul>
        <p>
            <a title="polyglot markup">Polyglot markup</a> is not constrained:
        </p>
        <ul>
            <li>to be <a href="http://www.w3.org/TR/2008/PER-xml-20080205/#dt-valid">valid XML</a>. [[!XML10]]</li>
            <li>by conformance to any XML DTD.</li>
        </ul>
        <p>
            <a title="polyglot markup">Polyglot markup</a> is scripted according to the rules of XML (does not use <code>document.write</code>, for example)
            and excludes HTML elements that are impossible to replicate in an XML parser (does not use the <code>noscript</code> element, for example).
            <a title="polyglot markup">Polyglot markup</a> triggers non-quirks mode in HTML parsers,
            as non-quirks mode is closest to XML-mode rendering, in regard to both DOM and CSS.
            <a title="polyglot markup">Polyglot markup</a> results in the same encoding and the same language in both HTML-mode and XML-mode.
        </p>

        <p>
            <a title="polyglot markup">Polyglot markup</a>, itself being valid HTML5,
            supports extensibility as it is defined in
            <a href="http://www.w3.org/TR/html5/infrastructure.html#extensibility">Section 2.2.3 Extensibility</a> of HTML5,
            so long as the extension does not violate the rules of <a>polyglot markup</a>. [[!HTML5]]
            In addition, being well formed XML, <a>polyglot markup</a> can be extended when it is served as <code>application/xhtml+xml</code>.
        </p>
    </section>
    <!--End section: principles-->
</section>
<section id="writing"><h2>Writing HTML documents</h2>
    <section id="PI-and-xml" class="section">
    <h3>Processing instructions and the XML declaration</h3>
    <p>
        Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>.
    </p>
    <!--End section: Processing Instructions and the XML Declaration-->
</section>
    <section id="character-encoding" class="section">
    <h3>Specifying a document’s character encoding</h3>
    <p>
        <a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support.
        HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]].
        For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>.
        As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]].
    </p>
    <p>
        <a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or
        in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>):
    </p>
    <ul>
        <li>Within the document
            <ul>
                <li>By using the Byte Order Mark (BOM) character</li>
                <li>By using the <dfn>HTML encoding declaration</dfn>
                    <ul><li><strong>either</strong> in its <code>charset</code> attribute form: <code>&lt;meta charset="UTF-8"/></code></li>
                        <li><strong>or</strong> in its alternative form: <code>&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></code></li>
                    </ul>
                </li>
            </ul>
        </li>
        <li>Outside the document
            <ul>
                <li>By adding <code>"charset=utf-8"</code> to the MIME/HTTP Content-Type header [[!HTTP11]], as the following examples show in HTML and XML, respectively: </li>
            </ul>
				<pre class="example">
					<code>Content-type: text/html; charset=utf-8</code>
				</pre>
				<pre class="example">
					<code>Content-type: application/xhtml+xml; charset=utf-8</code>
				</pre>
            Note that, when serving polyglot documents as XML, <code>charset=UTF-8</code> can safely be omitted, due to the UTF-8 encoding default of XML:
				<pre class="example">
					<code>Content-type: application/xhtml+xml</code>
				</pre>
        </li>
    </ul>

    <p class="note">
        Both XML and HTML parsers are required to support the byte order mark.
        The HTML encoding declaration has no effect in XML. When the HTML encoding declaration is
        the only encoding declaration, the encoding default from XML makes XML parsers treat content as UTF-8.
    </p>

    <p>
        The <a href="http://www.w3.org/International/questions/qa-html-encoding-declarations">W3C Internationalization (i18n) Group recommends</a> to always include
        a visible encoding declaration in a document, because it helps developers, testers, or translation production managers to check the encoding of a document visually.
    </p>
    <!--End section: Specifying a Document's Character Encoding-->
</section>
    <section id="doctype" class="section">
    <h3>The DOCTYPE</h3>
    <p>
        <a title="polyglot markup">Polyglot markup</a> uses a document type declaration (DOCTYPE) specified by <a href="http://www.w3.org/TR/html5/syntax.html#the-doctype">section 8.1.1</a> of [[!HTML5]].
        In addition, the DOCTYPE conforms to the following rules:
    </p>
    <ul>
        <li>The string <code>DOCTYPE</code> is in uppercase letters.</li>
        <li>The string <code>SYSTEM</code>, if present, is in uppercase letters.</li>
        <li>The string <code>PUBLIC</code>, if present, is in uppercase letters.</li>
        <li>A Formal Public Identifier (FPI), if present, is a case-sensitive match of the registered FPI to which it points.</li>
        <li>A URI, if present in the document type declaration, is a case-sensitive match of the URI to which it points.
            <ul>
                <li>If the URI is the string <code>about:legacy-compat</code>, <a>polyglot markup</a> includes the string in lowercase letters, as required by HTML5.</li>
                <li>If the URI is an http URL, the URI points to the correct resource, using case-sensitive letters.</li>
            </ul>
        </li>
    </ul>
    <p class="note">
        The string <code>html</code> SHOULD be in lowercase letters, in order to be both well-formed and valid XML;
        however, the string MAY be in mixed case or uppercase letters and still be well-formed XML.
    </p>
    <p>
        Note that using <code>about:legacy-compat</code> in XML may yield unpredictable parsing results, depending on the XML processing pipeline.
    </p>
    <p>
        <a title="polyglot markup">Polyglot markup</a> does not use document type declarations for HTML4, HTML3, or HTML2, regardless of whether they contain a URI or not and
        regardless of their effect in HTML5 parsers, as these document type declarations are not compatible with XHTML.
    </p>
    <!--End section: The DOCTYPE-->
</section>
    <section id="namespaces" class="section">
    <h3>Namespaces</h3>
    <p>
        The following rules apply to namespaces used in <a>polyglot markup</a>.
    </p>

    <section id="element-level-namespaces" class="section">
        <h4>Element-level namespaces</h4>
        <p>
            [[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>,
            and the root MathML element, <code>math</code>.
            <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML-compatibility [[!XML10]]:</p>
        <ul class="inline-list">
            <li><code>&lt;html xmlns="http://www.w3.org/1999/xhtml"></code></li>
            <li><code>&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML"></code></li>
            <li><code>&lt;svg xmlns="http://www.w3.org/2000/svg"></code></li>
        </ul>
        <p>
            <a title="polyglot markup">Polyglot markup</a> declares the default namespaces on the root HTML element, <code>html</code>,
            the root SVG element, <code>svg</code>, and the root MathML element <code>math</code>,
            and on any HTML elements used as children of SVG or MathML elements.
            <a title="polyglot markup">Polyglot markup</a> does not declare any other default or prefixed element namespace, because
            [[!HTML5]] does not natively support the declaring of any other default or prefixed element namespace.
        </p>
        <!-- End section, "Element-Level Namespaces" -->
    </section>

    <section id="attribute-level-namespaces" class="section">
        <h4>Attribute-level namespaces</h4>
        <p>
            [[!HTML5]] introduces undeclared (native) support for attributes in the XLink namespace and with the prefix <code>xlink:</code>.
            To maintain XML-compatibility, <a title="polyglot markup">polyglot markup</a> explicitly declares the XLink namespace:
            <code>xmlns:xlink="http://www.w3.org/1999/xlink"</code>). [[!XML10]]</p>
        <p>For conformance with the HTML specification’s conformance rules, the declaration has to take place in each foreign content
            section where it is used, typically on a such section’s root element (e.g. on the <code>svg</code> start tag for an SVG
            section and on the <code>math</code> start tag for a MathML section) since the declaration must occur before using any of
            the <code>xlink:</code> prefixed attributes, </p>

        <ul class="inline-list">
            <li><code>xlink:actuate</code></li>
            <li><code>xlink:arcrole</code></li>
            <li><code>xlink:href</code></li>
            <li><code>xlink:role</code></li>
            <li><code>xlink:show</code></li>
            <li><code>xlink:title</code></li>
            <li><code>xlink:type</code></li>
        </ul>
        <p>
            Note that there are other prefixed attributes that can be used beyond <code>xlink:href</code> (such as <code>xml:base</code>).
            <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via xmlns. The prefixes are implicitly declared
            in XML and are automatically applied to the appropriate attributes in HTML.
        </p>
        <p>
            The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML.
            Thus, the rules for how they can be sued as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]]
            For more on the issues related to attribute selectors and namespaces, with and without prefix, see the section on <a
            href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>.
        <p>

        <!-- End section, "Attribute-Level Namespaces" -->
    </section>
    <!--End section: Namespaces-->
</section>
    <section id="elements" class="section">
<h3>Element syntax</h3>
<p><a title="polyglot markup">Polyglot markup</a> conforms to the following rules regarding elements.</p>
        <section id="required-elements" class="section">
    <h6>Required elements and tags</h6>

    <p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a
            href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the
        HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for
        them. However, since XML does not have a feature whereby elements with one or both tags that have been
        omitted  from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM,
        omitting a tag in <a>polyglot markup</a> is equivalent of producing a not well-formed document or,
        if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not
        operate with <a>optional tags</a>.</p>

    <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used
        to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the
        <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot
            markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p>
    <section id="minimal-polyglot-html-document">
        <h4>A minimal HTML document</h4>
        <p>
            Every <a>polyglot markup</a> document therefore contains an <code>html</code>, <code>head</code>, <code>title</code>,
            and <code>body</code> element, represented in the code with their tags.
            The <code>html</code> element is the root element.
            The <code>head</code> and <code>body</code> elements are children of the <code>html</code> element.
            The <code>title</code> element is a child of the <code>head</code> element.
            Therefore, the following source code would be the most basic <a>polyglot markup</a> document.
        </p>
		<pre class="example highlight">&lt;!DOCTYPE html>
&lt;html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
  &lt;head>
    &lt;title>&lt;/title>
  &lt;/head>
  &lt;body>
  &lt;/body>
&lt;/html>
		</pre>
    </section>
    <section id="required-tags-exampls">
        <h4>Required tags examples</h4>
        <p>

[890 lines skipped]

Received on Tuesday, 7 January 2014 21:23:31 UTC