CVS html5/html-xhtml-author-guide

Update of /sources/public/html5/html-xhtml-author-guide
In directory roscoe:/tmp/cvs-serv25120/html-xhtml-author-guide

Modified Files:
	html-xhtml-authoring-guide.html 
Log Message:
Initial addition of rules for use of CDATA declarations

--- /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html	2013/05/20 11:06:33	1.107
+++ /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html	2013/05/24 06:21:23	1.108
@@ -23,6 +23,11 @@
 	          wgPatentURI:  "http://www.w3.org/2004/01/pp-impl/40318/status",
 	      };
 	    </script>
+<style>table.simple tr>*:first-child{text-align:right;}
+table.simple th code{color:yellow;font-weight:bold;font-size:larger;}
+table.simple [colspan="2"]{text-align:center;}
+table.simple [colspan="3"]{text-align:center;}
+</style>
 </head>
 
 <body>
@@ -57,6 +62,14 @@
 <!--End section: Status of This Document-->
 </section>
 
+<!--
+note: for principle sectoin
+		In <a>polyglot markup</a>, the strings that XML and HTML interpret differently are considered <dfn>ambiguous
+        strings</dfn> and MUST NOT be used except when they are explicitly permitted
+(such as for the ambigous namespace prefix <code>xml:</code>, which is permitted as prefix for the <code>lang</code> in the XML namespace – <code>xml:lang</code>).
+-->
+
+
 <section id="introduction" class="informative">
 <h2>Introduction</h2>
 	<p>
@@ -759,8 +772,9 @@
 <section id="script-and-style" class="section">
 <h2>Script and Style</h2>
 	<p>
-		<a title="polyglot markup">Polyglot markup</a> includes script and style commands by linking to external files rather than including them in-line. 
-		<a title="polyglot markup">Polyglot markup</a> does not link to an external stylesheet by using the XML-specific xml-stylesheet processing instruction.
+		<a title="polyglot markup">Polyglot markup</a> includes script and style commands by linking to external files
+        rather than including them in-line.  <a title="polyglot markup">Polyglot markup</a> does not link to an external
+        stylesheet by using the XML-specific xml-stylesheet processing instruction.
 	    See also <a href="#PI-and-xml">Processing Instructions and the XML Declaration</a>.
 	</p>
 	<p>The following examples show how <a>polyglot markup</a> includes external script and style, respectively:</p>
@@ -771,53 +785,59 @@
 		Instead, use the <code>innerHTML</code> property for both HTML and XHTML.
     </p>
 	<p class="note">
-        The <code>innerHTML</code> property takes a string.
-		XML parsers parse the string as XML in XHTML. 
-		HTML parsers parse the string as HTML in HTML. 
-		Because of the difference in parsing, if you send the parser content that does not follow the rules for <a>polyglot markup</a> 
-		the results will differ for a DOM create with an XML parser and one created with an HTML parser.
-	</p>
-<section id="ambiguous-strings" class="section">
-<h3>Ambiguous Strings</h3>
-	<p>
-		Except for noted exceptions (such as <code>xml:lang="foo"</code>), 
-		<a>polyglot markup</a> does not use <a>ambiguous strings</a>. 
-		In <a>polyglot markup</a>, <dfn>ambiguous strings</dfn> are those strings that XML interprets differently from HTML and vice-versa. 
-		Therefore, for the content of <code>script</code> and <code>style</code> tags, <a>polyglot markup</a> does not use the following strings:
-	</p>
-	<table class="simple">
-		<thead>
-			<tr>
-				<th>String</th>
-				<th>Notes</th>
-			</tr>
-		</thead>
-		<tbody>
-			<tr>
-				<td>&lt;</td>
-				<td>
-				XML interprets the less than symbol as the commencement of a tag, comment, or CDATA block, 
-				even if the symbol occurs within <code>script</code> or <code>style</code> tags.
-				</td>
-			</tr>
-			
-			<tr>
-				<td>&amp;</td>
-			<td>
-				XML interprets the ampersand as the commencement of a reference or entity, 
-				even if the symbol occurs within <code>script</code> or <code>style</code> tags. 
-				As a consequence, <a>polyglot markup</a> does not contain <code>script</code> or <code>style</code> elements 
-				that contain HTML entities, XML entities, or character references.
-			</td>
-			</tr>
-			
-			<tr>
-			<td>&#93;&#93;&gt;</td>
-			<td>XML interprets this string as the end of a CDATA block.</td>
-			</tr>
-		</tbody>
+        The <code>innerHTML</code> property takes a string. However, XML parsers will parse that string as XML in XHTML while HTML parsers
+        parse will parse that string as HTML in HTML.  And because of this difference in parsing, the code that <code>innerHTML</code> inserts
+        must follow the guidelines for <a>polyglot markup</a> or else the DOM generated by the XML parser will
+        differ from the DOM generated by the HTML parser.
+	</p>
+<section id="ambiguous-strings-in-script-and-style" class="section">
+<h3>Ambiguous Strings in <code>script</code> and <code>style</code></h3>
+    <p>
+        In the HTML syntax, <code>script</code> and <code>style</code> fall into the category of
+        <a href="http://www.w3.org/TR/html5/syntax.html#raw-text-elements">raw text elements</a>. As a consequence,
+        the HTML parser will see their content as a single text node and will, in contrast to XML
+        parsers, not interpret child element nodes, comment nodes, CDATA sections or character entities (or errors in 
+        any of these) as such nodes or entities, but will instead handle them as raw, uninterpreted text.
+        </p> 
+        <p>For that reason, inside <code>script</code> and <code>style</code>, the following strings are considered <a>ambigous</a>, and thus not permitted, except for the permission to, on certain criteria, use CDATA declarations:
+	</p>
+	<table class="simple" border="1" >
+        <caption>Table over the ambigous strings in <code>script</code> and <code>style</code> elements.</caption>
+	<colgroup><col/><col/><col/><col/><col/><col/></colgroup>
+<thead>
+	<tr>
+	<th rowspan="2" >Ambiguous?</th><th rowspan="2">String</th><th rowspan="2">Info</th><th rowspan="2">HTML interpretation</th><th colspan="2">XML interpretation</th>
+	</tr>
+        <tr><th>if inside <code>&lt;[CDATA[</code>section<code>&#x5d;]></code></th><th>if outside <code>&lt;[CDATA[</code>section<code>&#x5d;]></code></th>
+       </tr>
+</thead>
+<tbody>
+<tr><td>ambiguous</td><td><code>&lt;</code></td><td>LESS-THAN SIGN</td><td>nearly uninterpreted</td><td>completely uninterpreted</td><td>interpreted <small>(commences tags, comments, CDATA)</small></td></tr>
+<tr><td>ambiguous</td><td><code>&amp;</code></td><td>AMPERSAND</td><td colspan="2">completely uninterpreted</td><td>interpreted <small>commences character reference or entity</small></td></tr>
+<tr><td>ambiguous</td><td><code>&lt;&#x2d;-</code></td><td>start of comment</td><td>partly unintepreted</td><td>completely uninterpreted</td><td>interpreted</td></tr>
+<tr><td>ambiguous</td><td><code>&#x2d;-></code></td><td>end of comment</td><td>partly unintepreted</td><td>completely uninterpreted</td><td>interpreted</td></tr>
+<tr><td>ambiguous</td><td><code>&lt;[CDATA[</code></td><td>start of CDATA declaration</td><td colspan="2">completely uninterpreted<td>interpreted <small>(begins CDATA block)</small></td></tr>
+<tr><td>ambiguous</td><td><code>&#93;]></code></td><td>end of CDATA declaration</td><td colspan="2">completely uninterpreted<td>interpreted <small>(ends CDATA block</small></td></tr>
+<tr><td>ambiguous</td><td><code>cdata content</code></td><td>the content of CDATA sections</td><td></td><td>completely uninterpreted<td>—</td></tr>
+<tr><td>ambiguous</td><td><code>&lt;/script</code> </td><td>if occuring inside  <code>script</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>completely uninterpreted</td><td>interpreted</td></tr>
+<tr><td>ambiguous</td><td><code>&lt;/style</code></td><td>if occuring inside <code>style</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>completely uninterpreted</td><td>interpreted</td></tr>
+<tr><td>ambiguous</td><td><code>&lt;foo>&lt;/bar></code></td><td>all other tags, wellformed or not</td><td colspan="2">completely uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> </tbody>
+<tbody>
+<tr><td>unambiguous</td><td><code>none of the above strings</code></td><td></td><td colspan="3">completely uninterpreted</td></tr>
+</tbody>
+
 	</table>
-	<p>The following example is <a>polyglot markup</a> because there are no <a>ambiguous strings</a> within the <code>script</code> tag. </p>
+
+<p>Outside CDATA declarations, the content of <code>script</code> and <code>style</code> MUST NOT use ambigious strings, as anything else results in unequal DOMs for XML or HTML or risks that the author gets stuck in hard to trace differences between XML and HTMl. This is often also the most robust and simplest coding method, and also promotes the use of external styles and scripts, which is considered a best practise. However, as some scripts and stylesheets (such as JavaScript) make use of <code>&lt;</code>, <code>&amp;</code> in their syntax or, often, contain strings of markup, authors MAY also declare CDATA sections inside <code>script</code> and <code>style</code>.</p>
+<p>But note that while the CDATA ’tags’ will be ignored by scripts and stylesheets that operate in XML mode, the very declartion will be visible to in HTML mode, which in turn might cause the script to not work until the declaration is escaped.</p>
+<p>The use of CDATA sections MUST adhere to the following rules:</p>
+<ul>
+<li>If the syntax rules of the script or stylesheet does not iclude CDATA declarations in its syntax, then use the comment syntax (or another loophole) of the script/stylesheet language to hide the CDATA declaration from the script.</li>
+<li>If HTML comments, or the start or the end of an HTML comment is inserted in an CDATA section, then it SHOULD also be closed within the same CDATA section.</li>
+<li>Inside a CDATA end section, the CDTATA ‘end tag’ — <code>&#x5D;]></code> – MUST be escaped, though the exact escaping method depends on the goals of the use.</li>
+</ul>
+
+	<p>The following the example is <a>polyglot markup</a> because there are no <a>ambiguous strings</a> within the <code>script</code> tag. </p>
 		<pre class="example highlight">&lt;script&gt;document.body.appendChild(document.createElement("div"));&lt;/script&gt;</pre>
 	<p class="note">
 		A workaround for using ambiguous strings is to include the properly escaped characters 

Received on Friday, 24 May 2013 06:21:29 UTC