CVS html5/html-xhtml-author-guide

Update of /sources/public/html5/html-xhtml-author-guide
In directory roscoe:/tmp/cvs-serv32485/html-xhtml-author-guide

Modified Files:
	html-xhtml-authoring-guide.html 
Log Message:
Finalizing bug 20201, fixing bug 13392. Fixing an issue where the structuring was wrong.  The ficing of bug 20201  and the structure, lead to a major fix of the ”writing HTML documents” - the rewrite is still not ready.

--- /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html	2013/09/02 04:39:55	1.131
+++ /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html	2013/09/05 03:37:47	1.132
@@ -8,7 +8,7 @@
 	      var respecConfig = {
 	          specStatus:   "ED",
 	          shortName:    "html-polyglot",
-                  publishDate:  "2013-09-01",
+                  publishDate:  "2013-09-03",
 	          previousPublishDate:  "2010-10-19",
 	          previousMaturity:  "WD",
 	          edDraftURI:           "http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html",
@@ -37,7 +37,7 @@
 <body>
 
 <section id="abstract">
-	A document that uses polyglot markup is a document that is a stream of bytes that parses into identical document trees 
+	A document that uses <a title="polyglot markup">polyglot markup</a> is a document that is a stream of bytes that parses into identical document trees 
 	(with some exceptions, as noted in the <a href="#introduction">Introduction</a>) when processed as HTML and when processed as XML.
 	Polyglot markup that meets a well-defined set of constraints is interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification. 
 	Polyglot markup uses a specific DOCTYPE, namespace declarations, and a specific case—normally lower case but occasionally camel case—for element and attribute names. 
@@ -112,12 +112,12 @@
 
     <p>Polyglot markup is a means to an end – <dfn id="dfn-robustness">robustness</dfn>. It is not a goal in itself. However, authors do not need
        to understand these benefits in order to use and benefit from this syntax. But neither does anyone
-       need to exaggerate its benefits. For instance, polyglot markup does not add semantics. Polyglot markup does,
+       need to exaggerate its benefits. For instance, <a title="polyglot markup">polyglot markup</a> does not add semantics. Polyglot markup does,
        however, work to <em>preserve</em> semantics, including during the authoring process. Polyglot markup
        also doesn’t ensure accessibility - as it does not add any requirements
        that other relevant specs have not allready added. But it can work to <em>preserve</em> accessibility.</p>
 
-    <p>The motivation behind, and reason for polyglot markup to exist as a specification, is its widely supported
+    <p>The motivation behind, and reason for <a title="polyglot markup">polyglot markup</a> to exist as a specification, is its widely supported
         <a title="robustness">robustness</a>. With <a title="robustness">robust</a> (also known as conservative) markup, authors can <q cite="http://www.w3.org/TR/WCAG20/#robust">
         maximize compatibility with current and future user agents</q> and authoring tools. [[!WCAG20]]</p>
 
@@ -126,7 +126,7 @@
         they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers.
     </p>
 
-    <p> For the most part, polyglot markup is just a pure deduction of the validity constraints and syntax requirements that
+    <p> For the most part, <a title="polyglot markup">polyglot markup</a> is just a pure deduction of the validity constraints and syntax requirements that
         HTML and XHTML dictate, many of which took polyglotness into consideration when they were added to HTML5.
         However, for reasons of <a title="robustness">robustness</a>, the spec sometimes goes a little further than the principle of the lowest common
         denominator would have required.</p>
@@ -146,9 +146,9 @@
 
     <p>Using <a title="robustness">robust</a> syntax can enable documents to be parsed more reliable in less capable parsers.
        But even if the document can be expected to be parsed and validated by fully HTML5 conforming tools,
-       polyglot markup adds <a title="robustness">robustness</a>.  As an example, when serialized as HTML, the closing tag for
+       <a title="polyglot markup">polyglot markup</a> adds <a title="robustness">robustness</a>.  As an example, when serialized as HTML, the closing tag for
        the <code>p</code> element is entirely optional and will be inferred if not present.  But inclusion of
-       closings tags, as required by XML and, thus, by polyglot markup, cause no harm beyond a minor increase
+       closings tags, as required by XML and, thus, by <a title="polyglot markup">polyglot markup</a>, cause no harm beyond a minor increase
        in transfer size (an increase often mitigated by compression), but does
         allow validators to detect situations where the implicit closing rules
         don't match what the author intended.
@@ -157,7 +157,7 @@
        Polyglot markup is not defined as ”robust markup” because the XML-based polyglot markup
        syntax is not the only way to increase <a title="robustness">robustness</a>.
        For instance, an HTML validator or an authoring tool could require all tags to be closed even if
-       this is not required by the HTML syntax.  But then again, polyglot markup, being valid
+       this is not required by the HTML syntax.  But then again, <a title="polyglot markup">polyglot markup</a>, being valid
        XML, has some sometimes practical benefits which such a custom setup alone would not have.
     </p>
 </section>
@@ -207,9 +207,9 @@
 </section>
 <!--End section: principles-->
 </section>
-<section id="writing"><h3>Writing HTML documents with polyglot markup</h3>
+<section id="writing"><h3>Writing HTML documents</h3>
 <section id="PI-and-xml" class="section">
-<h2>Processing Instructions and the XML Declaration</h2>
+<h2>Processing instructions and the XML declaration</h2>
 <p>
 	Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>.
 </p>
@@ -218,7 +218,7 @@
 
 
 <section id="character-encoding" class="section">
-<h2>Specifying a Document’s Character Encoding</h2>
+<h2>Specifying a document’s character encoding</h2>
 	<p>
 		<a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. 
 		HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]].
@@ -226,15 +226,18 @@
 		As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]].
 	</p>
 	<p>
-		<a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or in combination:
+		<a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or
+        in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>):
 	</p>
 		<ul>
 			<li>Within the document
 				<ul>
-					<li>By using the Byte Order Mark (BOM) character (preferred).</li>
-					<li>By using <code>&lt;meta charset="UTF-8"/></code> (the HTML encoding declaration).</li>
-                    <li>By using <code>&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></code> (An <code>meta</code> element with an
-                        <code>http-equiv</code> attribute in the encoding declaration state).</li>
+					<li>By using the Byte Order Mark (BOM) character</li>
+					<li>By using the <dfn>HTML encoding declaration</dfn>
+                        <ul><li><strong>either</strong> in its <code>charset</code> attribute form: <code>&lt;meta charset="UTF-8"/></code></li>
+                            <li><strong>or</strong> in its alternative form: <code>&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></code></li>
+                        </ul>
+                    </li>
 				</ul>
 			</li>
 			<li>Outside the document		
@@ -247,12 +250,19 @@
 				<pre class="example">
 					<code>Content-type: application/xhtml+xml; charset=utf-8</code>
 				</pre>
-			</li>
+                Note that, when serving polyglot documents as XML, <code>charset=UTF-8</code> can safely be omitted, due to the UTF-8 encoding default of XML:
+				<pre class="example">
+					<code>Content-type: application/xhtml+xml</code>
+				</pre>
+            </li>
 		</ul>
-	<p class="note">
-		The HTML encoding declaration has no effect in XML. 
-		When the HTML encoding declaration is the only encoding declaration, the encoding default from XML makes XML parsers treat content as UTF-8.
-	</p>
+
+    <p class="note">
+        Both XML and HTML parsers are required to support the byte order mark.
+        The HTML encoding declaration has no effect in XML. When the HTML encoding declaration is
+        the only encoding declaration, the encoding default from XML makes XML parsers treat content as UTF-8.
+    </p>
+
 	<p>
 		The <a href="http://www.w3.org/International/questions/qa-html-encoding-declarations">W3C Internationalization (i18n) Group recommends</a> to always include 
 		a visible encoding declaration in a document, because it helps developers, testers, or translation production managers to check the encoding of a document visually.
@@ -300,7 +310,7 @@
 	</p>
 	
 	<section id="element-level-namespaces" class="section">
-	<h3>Element-Level Namespaces</h3>
+	<h3>Element-level namespaces</h3>
 	<p>
 		[[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>,
 		and the root MathML element, <code>math</code>.
@@ -321,7 +331,7 @@
 	</section>
 	
 	<section id="attribute-level-namespaces" class="section">
-	<h3>Attribute-Level Namespaces</h3>
+	<h3>Attribute-level namespaces</h3>
 		<p>
 			[[!HTML5]] introduces undeclared (native) support for attributes in the XLink namespace and with the prefix <code>xlink:</code>. 
 			<a title="polyglot markup">Polyglot markup</a> declares the XLink namespace on the HTML root element (<code>html</code>) or 
@@ -364,15 +374,30 @@
 </section>
 
 <section id="elements" class="section">
-<h2>Elements</h2>
+<h2>Element syntax</h2>
 <p><a title="polyglot markup">Polyglot markup</a> conforms to the following rules regarding elements.</p>
 
 	<section id="required-elements" class="section">
-	<h3>Required Elements</h3>
-	
+	<h3>Required elements and tags</h3>
+
+<p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a
+    href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the
+    HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for
+    them. However, since XML does not have a feature whereby elements with one or both tags that have been
+    omitted  from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM,
+    omitting a tag in <a>polyglot markup</a> is equivalent of producing a not <a>well-formed</a> document or,
+    if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not
+    operate with <a>optional tags</a>.</p>
+
+<p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used
+   to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the
+    <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot
+   markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p>
+        <section id="minimal-polyglot-html-document">
+            <h4>A minimal HTML document</h4>
 		<p>
-			Every <a>polyglot markup</a> document contains an <code>html</code>, <code>head</code>, <code>title</code>, 
-			and <code>body</code> element. 
+			Every <a>polyglot markup</a> document therefore ontains an <code>html</code>, <code>head</code>, <code>title</code>,
+			and <code>body</code> element, represented in the code with their tags.
 			The <code>html</code> element is the root element. 
 			The <code>head</code> and <code>body</code> elements are children of the <code>html</code> element.
 			The <code>title</code> element is a child of the <code>head</code> element.
@@ -387,6 +412,9 @@
   &lt;/body>
 &lt;/html>
 		</pre>
+        </section>
+        <section id="required-tags-exampls">
+           <h4>Required tags examples</h4>
 		<p>
 			Whenever it uses a <code>tr</code> element, <a>polyglot markup</a> always wraps the <code>tr</code> element inside a 
 			<code>tbody</code>, <code>thead</code>, or <code>tfoot</code> element. 
@@ -418,30 +446,37 @@
 			Incorrect:
 			<pre class="illegal-example highlight">&lt;table>
 &lt;col>...</pre>
+ </section>
+
 	</section>
 
+	<section id="excluded-eelements" class="section">
+	<h3>Excluded elements and tags</h3>
 
-	<section id="elements-that-cannot-be-used" class="section">
-	<h3>Elements that Cannot Be Used in Polyglot Markup</h3>
-		<p>
-			<a title="polyglot markup">Polyglot markup</a> does not use the <code>noscript</code> element, because 
-			the <code>noscript</code> element cannot be used in XML documents. [[!HTML5]]
-		</p>
+        <p>
+            The <code>noscript</code> element is non-conforming in XHTML, and therefore also in <a>polyglot markup</a>,
+            due to the fact that XML has no mechanism by which to produce the effect it has in HTML.[[!HTML5]]
+        </p>
+        <p class="note">
+            Elements with features designed for HTML alone, are non-polyglot from the outset. Currently, all such
+            elements are legacy elements, and all but <code>noscript</code>, which HTML5 forbids in XHTML alone, are
+            also obsoleted by the HTML specification for both HTML and XHTML.
+        </p>
 	<!--End section: Elements that Cannot Be Used in Polyglot Markup-->
 	</section>
 
 
 	<section id="case-sensitivity" class="section">
-	<h3>Case-Sensitivity</h3>
+	<h3>Case-sensitivity</h3>
 		<p>
-			The following guidelines apply to any usage of element names, attribute names, or attribute values in markup, script, or CSS.
+			The following apply to any usage of element names, attribute names, or attribute values in markup, script, or CSS.
 			<a title="polyglot markup">Polyglot markup</a> uses lower case letters for all ASCII letters. 
 			For non-ASCII letters&#x2014;such as Greek, Cyrillic, or non-ASCII Latin letters&#x2014;<a>polyglot markup</a> respects case sensitivity as it is called for.
 		</p>
 		
 		
 		<section id="element-names" class="section">
-		<h4>Element Names</h4>
+		<h4>Element names</h4>
 			<p><a title="polyglot markup">Polyglot markup</a> uses the correct case for element names.</p>
 			<ul>
 				<li><a title="polyglot markup">Polyglot markup</a> uses lowercase letters for all HTML element names.</li>
@@ -493,11 +528,11 @@
 		
 
 		<section id="attribute-names" class="section">
-		<h4>Attribute Names</h4>
+		<h4>Attribute names</h4>
 			<p><a title="polyglot markup">Polyglot markup</a> uses the correct case for attribute names.</p>
 			<ul>
 	        	<li><a title="polyglot markup">Polyglot markup</a> uses lowercase letters in attribute names for all HTML elements.</li>
-	        	<li><a title="polyglot markup">Polyglot markup</a> uses lowercase letters in attribute names for all MathML elements except the lowercase <code>definitionurl</code>,  
+	        	<li><a title="polyglot markup">Polyglot markup</a> uses lowercase letters in attribute names for all MathML elements except the lowercase <code>definitionurl</code>,
 	        		which <a>polyglot markup</a> changes to the mixed case <code>definitionURL</code>.</li>
 				<li><a title="polyglot markup">Polyglot markup</a> uses lowercase letters in attribute names for all SVG elements except the following, 
 				for which <a>polyglot markup</a> uses mixed case:
@@ -572,7 +607,7 @@
 
 
 		<section id="attribute-values" class="section">
-		<h4>Attribute Values</h4>
+		<h4>Attribute values</h4>
 		<p>
 			For characters in attribute values, <a>polyglot markup</a> maintains case consistency between markup, DOM APIs, and CSS 
 			when these attributes are used on HTML elements. 
@@ -619,80 +654,281 @@
 		</section>
 <!--End section: Case-Sensitivity-->
 	</section>
+<!--End section: Elements -->
+</section>
 
+<section id="contents-of-elements" class="section">
+<h2>Element contents</h2>
+<p>For the <a href="http://www.w3.org/TR/html5/syntax.html#elements-0">different kinds of elements</a> that HTML documents contain, <a>polyglot markup</a> conforms to the following contents rules.</p>
 	<section id="empty-elements" class="section">
-	<h3>Void Elements</h3>
-	<p><a title="polyglot markup">Polyglot markup</a> uses only the elements in the following list as void elements.</p>
-		<ul class="inline-list">
-			<li><code>area</code></li>
-			<li><code>base</code></li>
-			<li><code>br</code></li>
-			<li><code>col</code></li>
-			<li><code>command</code></li>
-			<li><code>embed</code></li>
-			<li><code>hr</code></li>
-			<li><code>img</code></li>
-			<li><code>input</code></li>
-			<li><code>keygen</code></li>
-			<li><code>link</code></li>
-			<li><code>meta</code></li>
-			<li><code>param</code></li>
-			<li><code>source</code></li>
-			<li><code>track</code></li>
-			<li><code>wbr</code></li>	
-		</ul>
-	  <p><a title="polyglot markup">Polyglot markup</a> uses the minimized tag syntax for void elements, e.g. <code>&lt;br/></code>,
-	  	rather than the alternative syntax <code>&lt;br>&lt;/br></code>.
+	<h3>Void elements</h3>
+	<p>In the HTML syntax, void elements are elements that always are empty and never has an end tag. All elements
+       listed as void <a href="http://www.w3.org/TR/html5/syntax.html#void-elements" >in the HTML specification</a> or
+        in an extension spec, MUST in <a title="polyglot markup">polyglot
+       markup</a> have the syntactic form of an XML <a href="http://www.w3.org/TR/REC-xml/#dt-empty"
+      ><dfn>empty-element tag</dfn></a> (<code>&lt;foo/></code>).  Other elements MUST NOT use the XML
+        <a>empty-element tag</a> syntax.</p>
+
+        <figure>
+<figcaption>The void elements of the HTML specification at the time of writing.</figcaption>
+            <blockquote cite="http://www.w3.org/TR/html5/syntax.html#void-elements">
+                <code>area</code>, <code>base</code>, <code>br</code>, <code>col</code>, <code>embed</code>,
+                    <code>hr</code>, <code>img</code>, <code>input</code>, <code>keygen</code>, <code>link</code>,
+                    <code>meta</code>, <code>param</code>, <code>source</code>,
+                    <code>track</code>, <code>wbr</code>
+           </blockquote>
+
+        </figure>
+	  <p><b>Example:</b> <a title="polyglot markup">Polyglot markup</a> uses the minimized tag syntax for void
+          elements, e.g. <code>&lt;br/></code>, and <em>does not use</em> <code>&lt;br>&lt;/br></code>.
 	  </p>
-	  <p>
-	  	Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) 
-	  	<a>polyglot markup</a> does not use the minimized form (e.g. the document uses <code>&lt;p>&lt;/p></code> and not <code>&lt;p /></code>).
+	  <p><b>Example:</b> Given an empty instance of an element whose content model is not EMPTY (for example, an empty
+          title or paragraph) <a>polyglot markup</a> <em>does not use</em> the minimized form. E.g. the document uses
+          <code>&lt;p>&lt;/p></code> and not <code>&lt;p/></code>.
 	  </p>
-	  <p>Note that MathML and SVG elements may be either self-closing or contain content.</p>
+
+  	  <p class="note">Elements in foreign content, such as MathML and SVG elements, may be either self-closing or contain content.</p>
+
 <!--End section: void Elements-->
 	</section>
-	
-	<section id="text-parsing-gotchas" class="section">
-	<h3>Elements with text parsing gotchas</h3>
-<p>Some conforming elements are parsed as more or less plain text by the HTML parser, and polyglot markup
-    thus needs authoring restrictions to be compatible with both parsers. The elements fall in two groups:
-     the “pure” raw text elements and the escapeable raw text elements, see
-    <a href="http://www.w3.org/html/wg/drafts/html/master/syntax.html#cdata-rcdata-restrictions">HTML5</a>.</p>
-  <section id="raw-text-elements">
-   <h4>Permitted raw text elements</h4>
-    <p>Escaped text within <a href="http://www.w3.org/TR/html5/syntax.html#raw-text-elements"><dfn>raw text elements</dfn></a>, are not interpreted as escaped text by the HTML parser, whereas the XML parser do treat them as such. As result, character references cannot be
-     be used directly, inside them as these are interpreted differently in XML and HTML. The conforming
-     elements in polyglot markup, are the following:</p>
-<ul>
-    <li><code>script</code> (To allow character references, polyglot <a href="#ambiguous-strings-in-script-and-style">permits character references via <code>CDATA</code></a>)</li>
-    <li><code>style</code> </li>
-    <li><code>iframe</code> </li>
-</ul>
+
+
+<section id="raw-text-elements">
+<h4>Raw text elements (<code>script</code> and <code>style</code>)</h4>
+<p>
+    In <a>polyglot markup</a>, the contents of all elements listed as raw text elements
+    <a href="http://www.w3.org/TR/html5/syntax.html#raw-text-elements" >in the HTML specification</a> or
+    in an extension spec, MUST conform to the extra requirements defined in this section.
+</p>
+
+<figure>
+   <figcaption>HTML5”s list of raw text elements</figcaption>
+   <blockquote cite="http://www.w3.org/TR/html5/syntax.html#raw-text-elements">
+       <code>script</code>, <code>style</code>
+       <!-- iframe and noscript don't count as raw text for syntax purposes -->
+   </blockquote>
+ </figure>
+
+<p>
+In the HTML syntax, the contents of raw text elements is raw text, by which it is referred to the fact
+that the HTML parser will not treat contained code that look like tags (element tags and comment tags), character references,
+CDATA etc as tags, character references, CDATA etc, but as raw text. (See HTML5 for the exact rules.)
+In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc.
+</p>
+<p>As result, in HTML, it is simpler than it is in XHTML, for authors to comply with the requirement of the default MIME
+types of the raw text elements. On the other side, by the use of <code class="CDATA">CDATA</code>, the raw text contents
+parsed as XHTML, can be made ven less semantic than the raw text data of HTML, leading to potential harms if the document
+is parsed as HTML
+</p>
+
+<figure id="ambiguous-table">
+    <figcaption>Overview over the differences in how HTML and XML parse raw text elements</figcaption>
+    <table class="simple" border="1" >
+
+        <colgroup><col/><col/><col/><col/><col/><col/></colgroup>
+        <thead>
+        <tr>
+            <th rowspan="2">Ambiguous string</th><th rowspan="2">Info</th><th rowspan="2">HTML interpretation</th><th colspan="2">XML interpretation</th>
+        </tr>
+        <tr><th>if inside <code>&lt;[CDATA[</code>section<code>&#x5d;]></code></th><th>if outside <code>&lt;[CDATA[</code>section<code>&#x5d;]></code></th>
+        </tr>
+        </thead>
+        <tbody>
+        <tr>
+            <td><code>&lt;</code></td>
+            <td>LESS-THAN SIGN</td><td>uninterpreted <small>(but see the <code>&lt;/script</code> and <code>&lt;/style</code> rows)</small></td>
+            <td>uninterpreted</td><td>interpreted <small>(commences tags, comments, CDATA)</small></td></tr>

[395 lines skipped]

Received on Thursday, 5 September 2013 03:37:49 UTC