- From: Eliot Graff via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 23 Jun 2010 17:35:47 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-xhtml-author-guide In directory hutz:/tmp/cvs-serv6735 Added Files: html-xhtml-authoring-guide - WD.html Log Message: FPWD draft --- NEW FILE: html-xhtml-authoring-guide - WD.html --- <!DOCTYPE html> <html> <head> <title>Polyglot Markup: HTML-Compatible XHTML Documents</title> <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/> <!-- === NOTA BENE === For the three scripts below, if your spec resides on dev.w3 you can check them out in the same tree and use relative links so that they'll work offline, --> <script src='http://dev.w3.org/2009/dap/ReSpec.js/js/respec.js' class='remove'></script> <script class='remove'> var respecConfig = { // specification status (e.g. WD, LCWD, NOTE, etc.). If in doubt use ED. specStatus: "WD", // the specification's short name, as in http://www.w3.org/TR/short-name/ // TODO: Get URL from Michael Smith shortName: "xxx-xxx", // if you wish the publication date to be other than today, set this publishDate: "2010-06-24", // TODO: Add previous pub date after 2nd publication. // if there is a previously published draft, uncomment this and set its YYYY-MM-DD date // and its maturity status previousPublishDate: "2010-06-22", previousMaturity: "ED", // if there a publicly available Editor's Draft, this is the link // TODO: Uncomment next line and add the link: edDraftURI: "http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html", // if this is a LCWD, uncomment and set the end of its review period // lcEnd: "2009-08-05", // if you want to have extra CSS, append them to this list // it is recommended that the respec.css stylesheet be kept extraCSS: ["http://dev.w3.org/2009/dap/ReSpec.js/css/respec.css"], // editors, add as many as you like // only "name" is required editors: [ { name: "Eliot Graff", company: "Microsoft Corporation", }, // Format for more editors // { name: "Your Name", url: "http://example.org/", // company: "Your Company", companyURL: "http://example.com/" }, ], // authors, add as many as you like. // This is optional, uncomment if you have authors as well as editors. // only "name" is required. Same format as editors. //authors: [ // { name: "Your Name", url: "http://example.org/", // company: "Your Company", companyURL: "http://example.com/" }, //], // name of the WG wg: "W3C HTML", // URI of the public WG page wgURI: "http://www.w3.org/html/wg/", // name (with the @w3c.org) of the public mailing to which comments are due wgPublicList: "public-html", // URI of the patent status for this WG, for Rec-track documents // !!!! IMPORTANT !!!! // This is important for Rec-track documents, do not copy a patent URI from a random // document unless you know what you're doing. If in doubt ask your friendly neighbourhood // Team Contact. wgPatentURI: "", }; </script> </head> <body> <section id="abstract"> <p> A document that uses polyglot markup is an HTML5 document which is at the same time an XML document and an HTML document, and which meets a well defined set of constraints. Polyglot markup that meets these constraints as interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification. Polyglot markup uses a specific doctype, namespace declarations, and a specific case—normally lower case but occasionally camel case—for element and attribute names. Polyglot markup uses lower case for certain attribute values. Further constraints include those on empty elements, named entity references, and the use of scripts and style. </p> </section> <section id='sotd'> <p>This document summarizes design guidelines for authors who wish their XHTML or HTML documents to validate on either HTML or XML parsers, assuming the parsers to be HTML5-compliant. This specification is intended to be used by web authors. It is not a specification for user agents and creates no obligations on user agents. Note that this recommendation does not define how HTML5-conforming user agents should process HTML documents. Nor does it define the meaning of the Internet Media Type text/html. For user agent guidance and for these definitions, see [[!HTML5]] and [[!RFC2854]]. </p> </section> <section id="introduction" class="section informative"> <h2>Introduction</h2> <p> It is often valuable to be able to serve HTML5 documents that are also valid XML documents. An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. These documents are served as text/html. The language used to create documents that can be parsed by both HTML and XML parsers is called <dfn>polyglot markup</dfn>. Polyglot markup is the overlap language of documents which are both HTML5 documents and XML documents. </p> </section> <section id="PI-and-xml" class="section"> <h2>Processing Instructions and the XML Declaration</h2> <p> Polyglot markup does not use processing instructions. Note that the parsing rules for the XML declaration are not processing instructions and are defined separately in <a href="http://www.w3.org/TR/REC-xml/#NT-XMLDecl">Prolog and Document Type Declaration</a>. <!-- TODO: Add Normative link once generated --> </p> </section> <section id="character-encoding" class="section"> <h2>Character Encoding</h2> <p> Polyglot markup uses either UTF-8 or UTF-16, although generally UTF-8 is preferred. When polyglot markup uses UTF-16, it SHOULD include the BOM indicating UTF-16LE or UTF-16BE. In addition, polyglot markup need not include the meta charset declaration, because the parser would have to read UTF-16 in order to parse it by definition. </p> <p> In short, for correct character encoding, polyglot markup MUST either: <ul> <li>Use UTF-8 or UTF-16 with the appropriate BOM.</li> </ul> <strong>OR</strong> <ul> <li>Use both the XML Declaration and <code>meta</code> tag to specify the appropriate character encoding.</li> </ul> </p> <p> If polyglot markup uses an encoding other than UTF-8 or UTF-16, it MUST include the XML declaration; however, in this case the document MUST also include the HTML <code>meta</code> tag specifying the character set. When polyglot markup uses both the XML declaration and the HTML <code>meta</code> tag, these MUST specify the same character and coding. </p> </section> <section id="doctype" class="section"> <h2>The DOCTYPE</h2> <p> Polyglot markup uses the <code><!DOCTYPE html></code> doctype. Note that for polyglot markup the string, <code>html</code>, MUST be lower case. For a pure HTML document, the string is defined as case-insensitive. [[!HTML5]] </p> </section> <section id="namespaces" class="section"> <h2>Namespaces</h2> <p> The following rules apply to namespaces used in polyglot markup. </p> <ul> <li> The <code><html></code> element uses the namespace declaration <code>xmlns="http://www.w3.org/1999/xhtml"</code>. </li> <li> All <code><math></code> elements uses the namespace declaration <code>xmlns="http://www.w3.org/1998/Math/MathML"</code>. </li> <li> All <code><svg></code> elements uses the namespace declaration <code>xmlns="http://www.w3.org/2000/svg"</code>. </li> <li> The xlink prefix is defined as <code>xmlns:xlink="http://www.w3.org/1999/xlink"</code> before using xlink:href. The prefix can be defined either: <ul> <li> Once on the root <code><html></code> element. </li> <li> Once on each <code><svg></code> element that contains one or more elements with xlink:href attributes. </li> </ul> </li> <li> No other elements should have namespace declarations. </li> </ul> </section> <section id="elements" class="section"> <h2>Elements</h2> <section id="required-elements"> <h3>Required Elements</h3> <p> Each document using polyglot markup MUST have a root <code>html</code> element. The root <code>html</code> element MUST contain both a <code>head</code> and a <code>body</code> element. The <code>head</code> element MUST contain a <code>title</code> element. </p> <section id="tables" class="section"> <h3>Tables</h3> <p> Polyglot markup MUST explicitly have a <code>tbody</code> element surrounding groups of <code>tr</code> elements within a <code>table</code> element. HTML parsers insert the <code>tbody</code> element, but XML parsers do not, thus creating different DOMs. </p> <p> Correct: <pre class="example"> <table> <tbody> <tr>... </pre> Incorrect: <pre class="example"> <table> <tr>... </pre> </p> </section> </section> <section id="case-sensitivity" class="section"> <h2>Case-Sensitivity</h2> <p> The following guidelines apply to any usage of element names, attribute names, or attribute values in markup, script, or CSS. When required, polyglot markup uses lower case letters for all ASCII letters; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. </p> <section id="element-names" class="section"> <h3>Element Names</h3> <p>Polyglot markup uses the correct case for element names.</p> <ul> <li> Polyglot markup uses lowercase letters for all HTML element names. </li> <li> Polyglot markup uses lowercase letters for all MathML element names. </li> <li> Polyglot markup uses lowercase letters for all SVG element names except the following, which MUST be in mixed case: <ul> <li><code>altGlyph</code></li> <li><code>altGlyphDef</code></li> <li><code>altGlyphItem</code></li> <li><code>animateColor</code></li> <li><code>animateMotion</code></li> <li><code>animateTransform</code></li> <li><code>clipPath</code></li> <li><code>feBlend</code></li> <li><code>feColorMatrix</code></li> <li><code>feComponentTransfer</code></li> <li><code>feComposite</code></li> <li><code>feConvolveMatrix</code></li> <li><code>feDiffuseLighting</code></li> <li><code>feDisplacementMap</code></li> <li><code>feDistantLight</code></li> <li><code>feFlood</code></li> <li><code>feFuncA</code></li> <li><code>feFuncB</code></li> <li><code>feFuncG</code></li> <li><code>feFuncR</code></li> <li><code>feGaussianBlur</code></li> <li><code>feImage</code></li> <li><code>feMerge</code></li> <li><code>feMergeNode</code></li> <li><code>feMorphology</code></li> <li><code>feOffset</code></li> <li><code>fePointLight</code></li> <li><code>feSpecularLighting</code></li> <li><code>feSpotLight</code></li> <li><code>feTile</code></li> <li><code>feTurbulence</code></li> <li><code>foreignObject</code></li> <li><code>glyphRef</code></li> <li><code>linearGradient</code></li> <li><code>radialGradient</code></li> <li><code>textPath</code></li> </ul> </li> </ul> </section> <section id="attribute-names" class="section"> <h3>Attribute Names</h3> <p> Polyglot markup uses the correct case for attribute names. </p> <ul> <li> Polyglot markup uses lowercase letters in attribute names for all HTML elements. </li> <li> Polyglot markup uses lowercase letters in attribute names for all MathML elements except the following: <p>The lowercase <code>definitionurl</code> MUST be changed to the mixed case <code>definitionURL</code>.</p> </li> <li> Polyglot markup uses lowercase letters in attribute names for all SVG elements except the following, which MUST be in mixed case: <ul> <li><code>attributeName</code></li> <li><code>attributeType</code></li> <li><code>baseFrequency</code></li> <li><code>baseProfile</code></li> <li><code>calcMode</code></li> <li><code>clipPathUnits</code></li> <li><code>contentScriptType</code></li> <li><code>contentStyleType</code></li> <li><code>diffuseConstant</code></li> <li><code>edgeMode</code></li> <li><code>externalResourcesRequired</code></li> <li><code>filterRes</code></li> <li><code>filterUnits</code></li> <li><code>glyphRef</code></li> <li><code>gradientTransform</code></li> <li><code>gradientUnits</code></li> <li><code>kernelMatrix</code></li> <li><code>kernelUnitLength</code></li> <li><code>keyPoints</code></li> <li><code>keySplines</code></li> <li><code>keyTimes</code></li> <li><code>lengthAdjust</code></li> <li><code>limitingConeAngle</code></li> <li><code>markerHeight</code></li> <li><code>markerUnits</code></li> <li><code>markerWidth</code></li> <li><code>maskContentUnits</code></li> <li><code>maskUnits</code></li> <li><code>numOctaves</code></li> <li><code>pathLength</code></li> <li><code>patternContentUnits</code></li> <li><code>patternTransform</code></li> <li><code>patternUnits</code></li> <li><code>pointsAtX</code></li> <li><code>pointsAtY</code></li> <li><code>pointsAtZ</code></li> <li><code>preserveAlpha</code></li> <li><code>preserveAspectRatio</code></li> <li><code>primitiveUnits</code></li> <li><code>refX</code></li> <li><code>refY</code></li> <li><code>repeatCount</code></li> <li><code>repeatDur</code></li> <li><code>requiredExtensions</code></li> <li><code>requiredFeatures</code></li> <li><code>specularConstant</code></li> <li><code>specularExponent</code></li> <li><code>spreadMethod</code></li> <li><code>startOffset</code></li> <li><code>stdDeviation</code></li> <li><code>stitchTiles</code></li> <li><code>surfaceScale</code></li> <li><code>systemLanguage</code></li> <li><code>tableValues</code></li> <li><code>targetX</code></li> <li><code>targetY</code></li> <li><code>textLength</code></li> <li><code>viewBox</code></li> <li><code>viewTarget</code></li> <li><code>xChannelSelector</code></li> <li><code>yChannelSelector</code></li> <li><code>zoomAndPan</code></li> </ul> </li> </ul> </section> <section id="attribute-values" class="section"> <h3>Attribute Values</h3> <p> Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements. More specifically, where required, polyglot markup MUST use lower case letters for all ASCII letters in these attribute values; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. Attributes for HTML elements other than those in the following list MAY have values made of mixed case letters. All attributes on non-HTML elements may have values made of mixed case letters. </p> <ul> <li><code>accept</code></li> <li><code>accept-charset</code></li> <li><code>align</code></li> <li><code>alink</code></li> <li><code>axis</code></li> <li><code>bgcolor</code></li> <li><code>charset</code></li> <li><code>checked</code></li> <li><code>clear</code></li> <li><code>codetype</code></li> <li><code>color</code></li> <li><code>compact</code></li> <li><code>declare</code></li> <li><code>defer</code></li> <li><code>dir</code></li> <li><code>direction</code></li> <li><code>disabled</code></li> <li><code>enctype</code></li> <li><code>face</code></li> <li><code>frame</code></li> <li><code>hreflang</code></li> <li><code>http-equiv</code></li> <li><code>lang</code></li> <li><code>language</code></li> <li><code>link</code></li> <li><code>media</code></li> <li><code>method</code></li> <li><code>multiple</code></li> <li><code>nohref</code></li> <li><code>noresize</code></li> <li><code>noshade</code></li> <li><code>nowrap</code></li> <li><code>readonly</code></li> <li><code>rel</code></li> <li><code>rev</code></li> <li><code>rules</code></li> <li><code>scope</code></li> <li><code>scrolling</code></li> <li><code>selected</code></li> <li><code>shape</code></li> <li><code>target</code></li> <li><code>text</code></li> <li><code>type</code></li> <li><code>valign</code></li> <li><code>valuetype</code></li> <li><code>vlink</code></li> </ul> </section> </section> <section id="empty-elements" class="section"> <h2>Empty Elements</h2> <p> Polyglot markup uses only the elements in the following list as empty elements. </p> <ul> <li><code>area</code></li> <li><code>base</code></li> <li><code>br</code></li> <li><code>col</code></li> <li><code>command</code></li> <li><code>embed</code></li> <li><code>hr</code></li> <li><code>img</code></li> <li><code>input</code></li> <li><code>keygen</code></li> <li><code>link</code></li> <li><code>meta</code></li> <li><code>param</code></li> <li><code>source</code></li> </ul> <p> Polyglot markup uses the minimized tag syntax for empty elements, e.g. <code><br/></code>. The alternative syntax <code><br></br></code> allowed by XML gives uncertain results in many existing user agents. </p> <p> Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) polyglot markup does not use the minimized form (e.g. the document uses <code><p></p></code> and not <code><p /></code>). </p> <p> Note that MathML and SVG elements may be either self-closing or contain content. </p> </section> </section> <section id="attributes" class="section"> <h2>Attributes</h2> <p>Polyglot markup does not contain line breaks and multiple white space characters within attribute values. These are handled inconsistently by user agents.</p> <p>Polyglot markup surrounds all attribute values with quotation marks. Attribute values MAY be surrounded either by single quotation marks or by double quotation marks.</p> <p>See also <a href="#attribute-values">Attribute Values</a>.</p> </section> <section id="named-entity-references" class="section"> <h2>Named Entity References</h2> <p> Polyglot markup uses only the following named entity references: </p> <ul> <li><code>amp</code></li> <li><code>lt</code></li> <li><code>gt</code></li> <li><code>apos</code></li> <li><code>quot</code></li> </ul> <p> For entities beyond the previous list, a ployglot document uses character references. For example, polyglot markup uses <code>&#160;</code> instead of <code>&nbsp;</code>. </p> </section> <section id="script-and-style" class="section"> <h2>Script and Style</h2> <p> Script and style commands SHOULD be included by linking to external files rather than including them in-line. However, polyglot markup MUST NOT link to an external stylesheet by using the xml-stylesheet processing instruction. See also <a href="#PI-and-xml">Processing Instructions and the XML Declaration</a>. </p> <p>The following examples show the proper way to include external script and style, respectively:</p> <pre class="example"> <script src="external.js"></script> </pre> <pre class="example"> <link rel="stylesheet" href="external.css"/> </pre> <p> Although <code>document.write()</code> and <code>document.writeln()</code> are valid in an HTML document, neither function may be used in XHTML. Therefore, neither is used in polyglot markup. Instead, use the <code>innerHTML</code> property for both HTML and XHTML. Note that the <code>innerHTML</code> property takes a string. XML parsers parse the string as XML in XHTML. HTML parsers parse the string as HTML in HTML. Because of the difference in parsing, if you send the parser content that does not follow the rules for polyglot markup the results will differ for a DOM create with an XML parser and one created with an HTML parser. </p> <section id="external-script-and-style" class="section"> <h3>External Script and Style</h3> <p> Polyglot markup uses external scripts if that document's script or style sheet uses <code><</code> or <code>&</code> or <code>]]></code> or <code>--</code>. Note that XML parsers are permitted to silently remove the contents of comments; therefore, the historical practice of hiding scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based user agents. </p> </section> <section id="in-line-script-and-style" class="section"> <h3>In-line Script and Style</h3> <p> If polyglot markup must use script or style commands within its source code, either use safe content or wrap the command in a CDATA section. However, polyglot markup does not use a <code>CDATA</code> section unless it is being used within foreign content. <ul> <li>Safe content is content that does not contain a <code><</code> or <code>&</code> character. The following example is safe because it does not contain problematic characters within the <code><script></code> tag. <pre class='example'> <script>document.body.appendChild(document.createElement("div"));</script> </pre> </li> <li>Wrap in-line script and style commands in a CDATA section</li> <p> Note that you cannot achieve same DOM in both XHTML and HTML by using in-line commands in a CDATA section. However, this is not usally a problem unless the code has a dependency on the exact number of text nodes under a <code><script></code> or <code><style></code> element. The following examples show in-line script and style commands wrapped in a <code>CDATA</code> section. </p> <pre class="example"> <script> //<![CDATA[ (script goes here) //]]> </script> </pre> <pre class="example"> <style> /*<![CDATA[*/ (styles go here) /*]]>*/ </style> </pre> <p> When using MathML or SVG, the parser follows the XML parsing rules. Polyglot markup does not rely on getting a CDATA instance from the DOM when using MathML or SVG, because the HTML parser does not create a CDATA instance in the DOM. </p> </ul> </p> </section> </section> <section id="foreign-content" class="section"> <h2>Exceptions from the Foreign Content Parsing Rules</h2> <p> <!-- TODO: Need to call out exceptions from the foreign content parsing rules (e.g. <foreignContent> --> </p> </section> <section class='appendix'> <h2>Acknowledgements</h2> <p> Many thanks to Daniel Glazman, Tony Ross, Sam Ruby, Jonas Sicking, Henri Sivonen, and Philip Taylor. Special thanks to the W3C TAG. </p> </section> </body> </html>
Received on Wednesday, 23 June 2010 17:35:49 UTC