- From: CVS User lsilli <cvsmail@w3.org>
- Date: Thu, 31 Oct 2013 09:03:09 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-xhtml-author-guide In directory roscoe:/tmp/cvs-serv1966/html-xhtml-author-guide Modified Files: .htaccess Added Files: html-xhtml-authoring-guide.html Log Message: Refactoring: Finishing the move of project from ./html-xhtml-authoring-guide to ./html-polyglot. --- /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html 2013/10/31 08:59:50 1.143 +++ /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html 2013/10/31 09:03:09 1.144 @@ -1,1278 +0,0 @@ -<!DOCTYPE html> -<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US" > -<head> - <title>Polyglot Markup: A robust profile of the HTML5 vocabulary</title> - <meta charset="utf-8" /> - <script class="remove" src="http://www.w3.org/Tools/respec/respec-w3c-common" async=""></script> - <script class="remove"> - var respecConfig = { - specStatus: "ED", - shortName: "html-polyglot", - publishDate: "2013-10-08", - previousPublishDate: "2010-10-19", - // previousDiffURI: "http://htmlwg.org/heartbeat/WD-html-polyglot-20131008/", -previousMaturity: "WD", - edDraftURI: "http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html", - // lcEnd: "2009-08-05", - editors: [ - { name: "Eliot Graff", company: "Microsoft Corporation" }, - { name: "Leif H. Silli", company: "<small>&</small>ᴍᴇᴛᴏᴅɪᴜꜱ ᴅᴀ"} - ], - wg: "HTML working group", - wgURI: "http://www.w3.org/html/wg/", - wgPublicList: "public-html", - wgPatentURI: "http://www.w3.org/2004/01/pp-impl/40318/status" - }; - </script> - <style>table.simple tr>*:first-child{text-align:right;} - table.simple th code{color:yellow;font-weight:bold;font-size:larger;} - table.simple [colspan="2"]{text-align:center;} - table.simple [colspan="3"]{text-align:center;} - ul.inline-list {white-space:normal} - ul.inline-list li {display:inline;} - ul.inline-list li:after {content:",";} - ul.inline-list li:last-child:after {content:"";} - </style> -</head> -<body> -<section id="abstract"> - A document that uses <a title="polyglot markup">polyglot markup</a> is a document that is a stream of bytes that parses into identical document trees - (with some exceptions, as noted in the <a href="#introduction">Introduction</a>) when processed as HTML and when processed as XML. - Polyglot markup that meets a well-defined set of constraints is interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification. - Polyglot markup uses a specific DOCTYPE, namespace declarations, and a specific case—normally lower case but occasionally camel case—for element and attribute names. - Polyglot markup uses lower case for certain attribute values. - Further constraints include those on void elements, named entity references, and the use of scripts and style. - <!--End section: Abstract--> -</section> -<section id="sotd"> - <p> - This document summarizes design guidelines for authors who wish their XHTML or HTML documents to validate on both HTML and XML parsers. - This specification is intended to be used by web authors, particularly authors who want to serve receivers which may have either (but not both) XML or HTML parsers available. - This commonly arises in legacy systems and content syndication. - Polyglot is one of several transition mechanisms from legacy XML to HTML5 and this document serves to describe it accurately. - </p> - <p> - No recommendation is made in this document or by the W3C regarding whether or not to publish polyglot content. - In general, authors are encouraged to publish HTML content using HTML5 syntax and media types - (either HTML syntax and <code>text/html</code>, or XHTML syntax and <code>application/xhtml+xml</code>). - </p> - <p> - This document is not a specification for user agents and creates no obligations on user agents. - Note that this recommendation does not define how HTML5-conforming user agents should process HTML documents. - Nor does it define the meaning of the Internet Media Type <code>text/html</code>. - For user agent guidance and for these definitions, see [[!HTML5]] and [[!RFC2854]]. - </p> - <p> - Please submit bugs for this document by using the W3C's public bug database (<a href="http://www.w3.org/Bugs/Public/"> - http://www.w3.org/Bugs/Public/</a>) with the product set to <kbd>HTML WG</kbd> and the component set to - <kbd>HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff)</kbd>. - If you cannot access the bug database, submit comments by email to the mailing list noted below. - </p> - <!--End section: Status of This Document--> -</section> -<!-- note: for principle section - In <a>polyglot markup</a>, the strings that XML and HTML interpret differently are considered <dfn>ambiguous - strings</dfn> and MUST NOT be used except when they are explicitly permitted -(such as for the ambigous namespace prefix <code>xml:</code>, which is permitted as prefix for the <code>lang</code> in the XML namespace – <code>xml:lang</code>). ---> - -<section id="introduction" class="informative"><h2>Introduction</h2> - <p>It is sometimes valuable to be able to serve HTML5 documents that are also well formed XML documents. - An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. - The language used to create documents that can be parsed by both HTML and XML parsers is called <a title="polyglot markup">polyglot markup</a>. - <a title="polyglot markup">Polyglot markup</a> is the overlap language of documents that are both HTML5 documents and XML documents. - It is recommended that these documents be served as either <code>text/html</code> (if the content is transmitted to an HTML-aware user agent) - or <code>application/xhtml+xml</code> (if the content is transmitted to an XHTML-aware user agent). - Other permissible MIME types are <code>text/xml</code>, <code>application/xml</code>, - and any MIME type whose subtype ends with the four characters "<code>+xml</code>". [[!XML-MT]]</p> - <!--end general--> - <section id="scope"> - <h3>Scope</h3> - <p>Polylglot markup is a <em><a title="robustness">robust</a></em> – but entirely <em>optional</em> – profile of the HTML vocabulary. - All web content need not be authored in <a>polyglot markup</a> and it is primarily an option - for authors wanting to increase the <a title="robustness">robustness</a> of their documents. - <a title="polyglot markup">Polyglot markup</a> works best, and can be a beneficial option, in controlled environments and for authoring tools.</p> - <p><a title="polyglot markup">Polyglot markup</a> is ideal for publishing when there's a strong desire to serve both HTML and XML tool chains - without simultaneously having to maintain dual copies of the content: one in HTML and a second in XHTML. - In addition, a single <a>polyglot markup</a> output requires less infrastructure to produce than to produce both HTML and XHTML output for the same content. - <a title="polyglot markup">Polyglot markup</a> is also be beneficial when lightweight processes—such as - quick testing or even hand-authoring—are applied to content intended to be published both as HTML and XHTML, - especially if that content is not sent through a tool chain.</p> - - <p class="note">XML-based HTML tools or systems intended for the most general contexts of use cannot <strong><em>depend</em></strong> on polyglot input: for maximum flexibility, - such tools should use the technique of using an HTML parser that produces an XML-compatible DOM or event stream.</p> - </section> - <!--end scope--> - <section id="robust"> - <h3>Robustness</h3> - - <p>Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. It is not a goal in itself. However, authors do not need - to understand these benefits in order to use and benefit from this syntax. But neither does anyone - need to exaggerate its benefits. For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. Polyglot markup does, - however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup - also doesn’t ensure accessibility - as it does not add any requirements - that other relevant specs have not allready added. But it can work to <em>preserve</em> accessibility.</p> - - <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> to exist as a specification, is its widely supported - <a title="robustness">robustness</a>. With <a title="robustness">robust</a> (also known as conservative) markup, authors can - <q cite="http://www.w3.org/TR/WCAG20/#robust">maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p> - - <p>Polyglot markup seeks to define constraints on the serialization of a DOM tree in a <a title="robustness">robust</a> manner that - is likely to retain semantics when said serialization is reparsed using a variety of parsers, be - they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers. - </p> - - <p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that - HTML and XHTML dictate, many of which took polyglotness into consideration when they were added to HTML5. - However, for reasons of <a title="robustness">robustness</a>, the spec sometimes goes a little further than the principle of the lowest common - denominator would have required.</p> - - <p> For instance, included in the set of constraints on the serialization is the requirement to use the UTF-8 encoding. - This requirement is not only because of the documented benefits (the HTML-specific benefits are described in HTML5 [[!HTML5]]) – - which in turn has lead the HTML5 specification to recommend - that all new documents use UTF-8, but also because it is the sole encoding that <em>every</em> parser, be it an HTML parser or - an XML parser, is required to support. Also, UTF-8 might in some situations be the sole <em>HTML-conforming</em> option, since it is one of - only two encodings (the other being UTF-16, with its own, separate set of well-known issues) for which XML well-formed - rules doesn’t require the encoding to be explicitly declared. This in turn has the benefit that the anyhow HTML-invalid XML - encoding declaration kan reliably be skipped without causing any side-effects. E.g. if one opted to use the <code>KOI8-R</code>, - encoding, then, as a side-effect of HTML-conformance and XML well-formedness requirements, the author would have - been forced to rely on a higher protocol (such as MIME <code>Content-Type</code>) in order to support XML parsers. By requiring - UTF-8, this side-effect is avoided. And so, while not the only theoretical possibility, the choice of - UTF-8 as the sole option, is justified by the underlying principle of <a title="robustness">robustness</a>.</p> - - <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers. - But even if the document can be expected to be parsed and validated by fully HTML5 conforming tools, - <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>. As an example, when serialized as HTML, the closing tag for - the <code>p</code> element is entirely optional and will be inferred if not present. But inclusion of - closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, cause no harm beyond a minor increase - in transfer size (an increase often mitigated by compression), but does - allow validators to detect situations where the implicit closing rules - don't match what the author intended. - </p> - <p class="note"> - Polyglot markup is not defined as "robust markup" because the XML-based polyglot markup - syntax is not the only way to increase <a title="robustness">robustness</a>. - For instance, an HTML validator or an authoring tool could require all tags to be closed even if - this is not required by the HTML syntax. But then again, <a title="polyglot markup">polyglot markup</a>, being valid - XML, has some sometimes practical benefits which such a custom setup alone would not have. - </p> - </section> - <!--end robust--> -</section> -<!-- end intro--> - -<section id="syntax"> - <h2>Syntax</h2> - <section id="principles"><h3>Principles</h3> - <p> - <dfn>Polyglot markup</dfn> results in: - </p> - <ul> - <li>a valid HTML document. [[!HTML5]]</li> - <li>a <a href="http://www.w3.org/TR/2008/PER-xml-20080205/#sec-well-formed">well-formed XML</a> document. [[!XML10]]</li> - <li>identical DOMs when processed as HTML and when processed as XML, with some notable exceptions: HTML and XML parsers generate different DOMs for some - <code>xml</code> (<code>xml:lang</code>, <code>xml:space</code>, and <code>xml:base</code>), - <code>xmlns</code> (<code>xmlns=""</code> and <code>xmlns:xlink=""</code>), and <code>xlink</code> (such as <code>xlink:href</code>) attributes. - XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers. The exception must not break the requiremetn to be a valid HTML document. - </li> - </ul> - <p> - <a title="polyglot markup">Polyglot markup</a> is not constrained: - </p> - <ul> - <li>to be <a href="http://www.w3.org/TR/2008/PER-xml-20080205/#dt-valid">valid XML</a>. [[!XML10]]</li> - <li>by conformance to any XML DTD.</li> - </ul> - <p> - <a title="polyglot markup">Polyglot markup</a> is scripted according to the rules of XML (does not use <code>document.write</code>, for example) - and excludes HTML elements that are impossible to replicate in an XML parser (does not use the <code>noscript</code> element, for example). - <a title="polyglot markup">Polyglot markup</a> triggers non-quirks mode in HTML parsers, - as non-quirks mode is closest to XML-mode rendering, in regard to both DOM and CSS. - <a title="polyglot markup">Polyglot markup</a> results in the same encoding and the same language in both HTML-mode and XML-mode. - </p> - - <p> - <a title="polyglot markup">Polyglot markup</a>, itself being valid HTML5, - supports extensibility as it is defined in - <a href="http://www.w3.org/TR/html5/infrastructure.html#extensibility">Section 2.2.3 Extensibility</a> of HTML5, - so long as the extension does not violate the rules of <a>polyglot markup</a>. [[!HTML5]] - In addition, being well formed XML, <a>polyglot markup</a> can be extended when it is served as <code>application/xhtml+xml</code>. - </p> - </section> - <!--End section: principles--> -</section> -<section id="writing"><h2>Writing HTML documents</h2> - <section id="PI-and-xml" class="section"> - <h3>Processing instructions and the XML declaration</h3> - <p> - Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>. - </p> - <!--End section: Processing Instructions and the XML Declaration--> -</section> - <section id="character-encoding" class="section"> - <h3>Specifying a document’s character encoding</h3> - <p> - <a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. - HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]]. - For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>. - As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]]. - </p> - <p> - <a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or - in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>): - </p> - <ul> - <li>Within the document - <ul> - <li>By using the Byte Order Mark (BOM) character</li> - <li>By using the <dfn>HTML encoding declaration</dfn> - <ul><li><strong>either</strong> in its <code>charset</code> attribute form: <code><meta charset="UTF-8"/></code></li> - <li><strong>or</strong> in its alternative form: <code><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></code></li> - </ul> - </li> - </ul> - </li> - <li>Outside the document - <ul> - <li>By adding <code>"charset=utf-8"</code> to the MIME/HTTP Content-Type header [[!HTTP11]], as the following examples show in HTML and XML, respectively: </li> - </ul> - <pre class="example"> - <code>Content-type: text/html; charset=utf-8</code> - </pre> - <pre class="example"> - <code>Content-type: application/xhtml+xml; charset=utf-8</code> - </pre> - Note that, when serving polyglot documents as XML, <code>charset=UTF-8</code> can safely be omitted, due to the UTF-8 encoding default of XML: - <pre class="example"> - <code>Content-type: application/xhtml+xml</code> - </pre> - </li> - </ul> - - <p class="note"> - Both XML and HTML parsers are required to support the byte order mark. - The HTML encoding declaration has no effect in XML. When the HTML encoding declaration is - the only encoding declaration, the encoding default from XML makes XML parsers treat content as UTF-8. - </p> - - <p> - The <a href="http://www.w3.org/International/questions/qa-html-encoding-declarations">W3C Internationalization (i18n) Group recommends</a> to always include - a visible encoding declaration in a document, because it helps developers, testers, or translation production managers to check the encoding of a document visually. - </p> - <!--End section: Specifying a Document's Character Encoding--> -</section> - <section id="doctype" class="section"> - <h3>The DOCTYPE</h3> - <p> - <a title="polyglot markup">Polyglot markup</a> uses a document type declaration (DOCTYPE) specified by <a href="http://www.w3.org/TR/html5/syntax.html#the-doctype">section 8.1.1</a> of [[!HTML5]]. - In addition, the DOCTYPE conforms to the following rules: - </p> - <ul> - <li>The string <code>DOCTYPE</code> is in uppercase letters.</li> - <li>The string <code>SYSTEM</code>, if present, is in uppercase letters.</li> - <li>The string <code>PUBLIC</code>, if present, is in uppercase letters.</li> - <li>A Formal Public Identifier (FPI), if present, is a case-sensitive match of the registered FPI to which it points.</li> - <li>A URI, if present in the document type declaration, is a case-sensitive match of the URI to which it points. - <ul> - <li>If the URI is the string <code>about:legacy-compat</code>, <a>polyglot markup</a> includes the string in lowercase letters, as required by HTML5.</li> - <li>If the URI is an http URL, the URI points to the correct resource, using case-sensitive letters.</li> - </ul> - </li> - </ul> - <p class="note"> - The string <code>html</code> SHOULD be in lowercase letters, in order to be both well-formed and valid XML; - however, the string MAY be in mixed case or uppercase letters and still be well-formed XML. - </p> - <p> - Note that using <code>about:legacy-compat</code> in XML may yield unpredictable parsing results, depending on the XML processing pipeline. - </p> - <p> - <a title="polyglot markup">Polyglot markup</a> does not use document type declarations for HTML4, HTML3, or HTML2, regardless of whether they contain a URI or not and - regardless of their effect in HTML5 parsers, as these document type declarations are not compatible with XHTML. - </p> - <!--End section: The DOCTYPE--> -</section> - <section id="namespaces" class="section"> - <h3>Namespaces</h3> - <p> - The following rules apply to namespaces used in <a>polyglot markup</a>. - </p> - - <section id="element-level-namespaces" class="section"> - <h4>Element-level namespaces</h4> - <p> - [[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>, - and the root MathML element, <code>math</code>. - <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML-compatibility [[!XML10]]:</p> - <ul class="inline-list"> - <li><code><html xmlns="http://www.w3.org/1999/xhtml"></code></li> - <li><code><math xmlns="http://www.w3.org/1998/Math/MathML"></code></li> - <li><code><svg xmlns="http://www.w3.org/2000/svg"></code></li> - </ul> - <p> - <a title="polyglot markup">Polyglot markup</a> declares the default namespaces on the root HTML element, <code>html</code>, - the root SVG element, <code>svg</code>, and the root MathML element <code>math</code>, - and on any HTML elements used as children of SVG or MathML elements. - <a title="polyglot markup">Polyglot markup</a> does not declare any other default or prefixed element namespace, because - [[!HTML5]] does not natively support the declaring of any other default or prefixed element namespace. - </p> - <!-- End section, "Element-Level Namespaces" --> - </section> - - <section id="attribute-level-namespaces" class="section"> - <h4>Attribute-level namespaces</h4> - <p> - [[!HTML5]] introduces undeclared (native) support for attributes in the XLink namespace and with the prefix <code>xlink:</code>. - <a title="polyglot markup">Polyglot markup</a> declares the XLink namespace on the HTML root element (<code>html</code>) or - once on the foreign element where it is used (<code>svg</code> or <code>math</code>), to maintain XML-compatibility [[!XML10]]. - </p> - <p>In <a>polyglot markup</a>, the xlink prefix uses the namespace declaration <code>xmlns:xlink="http://www.w3.org/1999/xlink"</code> before using the xlink prefix for the following attributes:</p> - <ul class="inline-list"> - <li><code>xlink:actuate</code></li> - <li><code>xlink:arcrole</code></li> - <li><code>xlink:href</code></li> - <li><code>xlink:role</code></li> - <li><code>xlink:show</code></li> - <li><code>xlink:title</code></li> - <li><code>xlink:type</code></li> - </ul> - <p> - Note that there are other prefixed attributes that can be used beyond <code>xlink:href</code> (such as <code>xml:base</code>). - <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via xmlns. The prefixes are implicitly declared - in XML and are automatically applied to the appropriate attributes in HTML. - </p> - <p> - The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML. - Thus, the rules for how they can be sued as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]] - For more on the issues related to attribute selectors and namespaces, with and without prefix, see the section on <a - href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>. - <p> - - <!-- End section, "Attribute-Level Namespaces" --> - </section> - <!--End section: Namespaces--> -</section> - <section id="elements" class="section"> -<h3>Element syntax</h3> -<p><a title="polyglot markup">Polyglot markup</a> conforms to the following rules regarding elements.</p> - <section id="required-elements" class="section"> - <h6>Required elements and tags</h6> - - <p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a - href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the - HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for - them. However, since XML does not have a feature whereby elements with one or both tags that have been - omitted from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM, - omitting a tag in <a>polyglot markup</a> is equivalent of producing a not well-formed document or, - if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not - operate with <a>optional tags</a>.</p> - - <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used - to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the - <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot - markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p> - <section id="minimal-polyglot-html-document"> - <h4>A minimal HTML document</h4> - <p> - Every <a>polyglot markup</a> document therefore contains an <code>html</code>, <code>head</code>, <code>title</code>, - and <code>body</code> element, represented in the code with their tags. - The <code>html</code> element is the root element. - The <code>head</code> and <code>body</code> elements are children of the <code>html</code> element. - The <code>title</code> element is a child of the <code>head</code> element. - Therefore, the following source code would be the most basic <a>polyglot markup</a> document. - </p> - <pre class="example highlight"><!DOCTYPE html> -<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang=""> - <head> - <title></title> - </head> - <body> - </body> -</html> - </pre> - </section> - <section id="required-tags-exampls"> - <h4>Required tags examples</h4> - <p> - Whenever it uses a <code>tr</code> element, <a>polyglot markup</a> always wraps the <code>tr</code> element inside a [881 lines skipped] --- /sources/public/html5/html-xhtml-author-guide/.htaccess 2013/10/31 08:59:49 1.7 +++ /sources/public/html5/html-xhtml-author-guide/.htaccess 2013/10/31 09:03:09 1.8 @@ -1,4 +1,4 @@ AddType text/html;charset=utf-8 .html DirectoryIndex html-xhtml-authoring-guide.html //301 Redirect Old File -Redirect 301 ./html-xhtml-authoring-guide.html http://dev.w3.org/html5/html-polyglot/html-polyglot.html +Redirect 301 html-xhtml-authoring-guide.html http://dev.w3.org/html5/html-polyglot/html-polyglot.html
Received on Thursday, 31 October 2013 09:03:14 UTC