- From: CVS User egraff <cvsmail@w3.org>
- Date: Wed, 08 Jan 2014 00:44:58 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/html-polyglot
In directory roscoe:/tmp/cvs-serv7358
Modified Files:
html-polyglot.html
Log Message:
More language edits, up to but not including 4.6.2.1 The safe text content option
--- /sources/public/html5/html-polyglot/html-polyglot.html 2014/01/07 22:41:46 1.20
+++ /sources/public/html5/html-polyglot/html-polyglot.html 2014/01/08 00:44:58 1.21
@@ -218,7 +218,7 @@
<section id="PI-and-xml" class="section">
<h3>Processing instructions and the XML declaration</h3>
<p>
- Processing Instructions and the XML Declaration are both forbidden in <a>polyglot markup</a>.
+ Processing instructions and the XML declaration are both forbidden in <a>polyglot markup</a>.
</p>
<!--End section: Processing Instructions and the XML Declaration-->
</section>
@@ -226,13 +226,14 @@
<h3>Specifying a document’s character encoding</h3>
<p>
<a title="polyglot markup">Polyglot markup</a> uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support.
- HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [[!HTML5]].
- For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>.
- As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported [[!XML10]].
+ HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a>. [[!HTML5]]
+ </p>
+ <p> For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>.
+ As such, character encoding MAY be left undeclared in XML with the result that UTF8 is still supported [[!XML10]].
</p>
<p>
<a title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or
- in combination (but note that here can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>):
+ in combination (but note that there can only be a <em>single</em> <a title="HTML encoding declaration">HTML encoding declaration</a>):
</p>
<ul>
<li>Within the document
@@ -316,7 +317,7 @@
<p>
[[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, <code>html</code>, the root SVG element, <code>svg</code>,
and the root MathML element, <code>math</code>.
- <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML-compatibility [[!XML10]]:</p>
+ <a title="polyglot markup">Polyglot markup</a> declares the following default namespaces, when the markup languages are included in the document, to maintain XML compatibility [[!XML10]]:</p>
<ul class="inline-list">
<li><code><html xmlns="http://www.w3.org/1999/xhtml"></code></li>
<li><code><math xmlns="http://www.w3.org/1998/Math/MathML"></code></li>
@@ -354,13 +355,13 @@
</ul>
<p>
Note that there are other prefixed attributes that can be used beyond <code>xlink:href</code> (such as <code>xml:base</code>).
- <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via xmlns. The prefixes are implicitly declared
+ <a title="polyglot markup">Polyglot markup</a> does not declare these prefixes via <code>xmlns</code>. The prefixes are implicitly declared
in XML and are automatically applied to the appropriate attributes in HTML.
</p>
<p>
The namespaced attributes, such as <code>xml:lang=""</code> and <code>xmlns=""</code>, are "namespaced" within XHTML, SVG and MathML.
- Thus, the rules for how they can be sued as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]]
- For more on the issues related to attribute selectors and namespaces, with and without prefix, see the section on <a
+ Thus, the rules for how they can be used as CSS selectors is governed by CSS namespaces. [[!CSS3NAMESPACE]]
+ For more about the issues related to attribute selectors and namespaces, with and without prefixes, see the section on <a
href="#scripting-and-styling-polyglot-markup">Scripting and styling polyglot markup</a>.
<p>
@@ -374,19 +375,21 @@
<section id="required-elements" class="section">
<h6>Required elements and tags</h6>
- <p> HTML5’s concept of <dfn>optional tags</dfn> – start tags and/or end tags – covers <a
- href="http://www.w3.org/TR/html5/syntax.html#optional-tags">elements that the
- HTML parser itself automatically adds to the DOM</a> if the code doesn’t contain the tags for
- them. However, since XML does not have a feature whereby elements with one or both tags that have been
- omitted from the code (such as when start and end tags of <code>html</code> are omitted) are added to the DOM,
- omitting a tag in <a>polyglot markup</a> is equivalent of producing a not well-formed document or,
- if both tags are omotted, equivalent of not adding the element at all. Therefore, <a>polyglot markup</a> does not
- operate with <a>optional tags</a>.</p>
-
- <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises e.g. for someone not used
- to adding e.g. the <code>tbody</code> tags in their code or to someone accustomed to omitting the end tag of the
- <code>p</code> element. However, the requirement to be complete with regard to tags, is a key feature of <a>polyglot
- markup</a> that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.</p>
+ <p> <a title="polyglot markup">Polyglot markup</a> does not employ <a>optional tags</a>.
+ HTML5’s concept of <dfn>optional tags</dfn> – missing start tags and/or end tags – covers
+ <a href="http://www.w3.org/TR/html5/syntax.html#optional-tags">
+ elements that the HTML parser itself automatically adds to the DOM</a>
+ if the code doesn’t contain the tags for them.
+ Because XML does not have such a feature that adds missing start and/or end tags to the DOM,
+ omitting a tag in <a>polyglot markup</a> is equivalent to producing a document that is not well-formed or,
+ if both tags are omitted, equivalent to not adding the element at all. </p>
+
+ <p>That <a>polyglot markup</a> doesn’t operate with optional tags, may create surprises for an author not used
+ to adding the <code>tbody</code> tags in their code, for example,
+ or to someone accustomed to omitting the end tag of the <code>p</code> element.
+ However, the requirement to be well-formed with regard to tags is a key feature of <a>polyglot markup</a>
+ that makes the code <a title="robustness">robust</a> against subpar parsers and authoring surprises.
+ </p>
<section id="minimal-polyglot-html-document">
<h4>A minimal HTML document</h4>
<p>
@@ -645,7 +648,7 @@
<a>polyglot markup</a> uses both the <code>lang</code> and the <code>xml:lang attributes</code>
(see <a href="#language-attributes">Language attributes</a>); however,
the <a href="http://www.w3.org/TR/css3-selectors/#lang-pseudo">CSS3 Selectors specification</a> stipulates that
- language attributes, including <code>xml:lang</code>, are matched in a case-insensitive way. [[!SELECT]]
+ language attributes, including <code>xml:lang</code>, are matched in a case insensitive way. [[!SELECT]]
</p>
<!--End section: Attribute values-->
</section>
@@ -704,15 +707,17 @@
</figure>
<p>
- In the HTML syntax, the contents of raw text elements is raw text, by which it is referred to the fact
- that the HTML parser will not treat contained code that look like tags (element tags and comment tags), character references,
- CDATA etc as tags, character references, CDATA etc, but as raw text. (See HTML5 for the exact rules.)
+ In HTML syntax, the content of raw text elements is raw text.
+ In other words, the HTML parser does not treat contained code that looks like tags (element tags and comment tags,
+ character references, CDATA, etc.) as tags, character references, CDATA, etc., but as raw text.
+ (See HTML5 for the exact rules.)
In the XHTML syntax, however, the same constructs <em>will</em> be treated as tags, character references, CDATA etc.
</p>
- <p>As result, in HTML, it is simpler than it is in XHTML, for authors to comply with the requirement of the default MIME
- types of the raw text elements. On the other side, by the use of <code class="CDATA">CDATA</code>, the raw text contents
- parsed as XHTML, can be made ven less semantic than the raw text data of HTML, leading to potential harms if the document
- is parsed as HTML
+ <p>As result, it is simpler for authors to comply with the requirement of the default MIME
+ types of the raw text elements in HTML than it is in XHTML.
+ On the other hand, with <code class="CDATA">CDATA</code>, the raw text contents
+ parsed as XHTML can be made even less semantic than the raw text data of HTML,
+ leading to potential harms if the document is parsed as HTML.
</p>
<figure id="ambiguous-table">
@@ -740,9 +745,9 @@
<tr><td><code>cdata content</code></td><td>the content of CDATA sections</td><td></td><td>uninterpreted</td><td>—</td></tr>
<tr><td><code></script</code> </td><td>if occuring inside <code>script</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr>
<tr><td><code></style</code></td><td>if occuring inside <code>style</code> element and followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)</td><td>terminates parent</td><td>uninterpreted</td><td>interpreted</td></tr>
- <tr><td><code><foo></bar></code></td><td>all other tags, wellformed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr>
+ <tr><td><code><foo></bar></code></td><td>all other tags, well-formed or not</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr>
<tr><td><code>&#foo;</code></td><td>character references</td><td>uninterpreted</td><td>uninterpreted</td><td>interpreted <small>subject to normal parsing rules</small></td></tr> </tbody> <tbody>
- <tr><th><code>none of the above strings</code></th><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr>
+ <tr><td><code>none of the above strings</code></td><td>Any other string</td><td>uninterpreted</td><td>uninterpreted</td><td>uninterpreted</td></tr>
</tbody>
</table>
</figure>
@@ -750,7 +755,7 @@
<p>Syntactically, the polyglot subset is found by</p>
<ul><li><em>either</em> <strong>limiting the content to <dfn>safe content</dfn></strong>, that
- is: text that gets interpreted the same way in HTML and in XML.</li>
+ is, text that gets interpreted the same way in HTML and in XML.</li>
<li><em>or</em> trying to <strong>even out the constraints differences</strong> by
wrapping the contents in a <code>CDATA</code> section. The <code>CDATA</code> code is then seen as text
by the HTML parser (and can thus interfere with the scripting or styling language!), while the XML parser sees the
Received on Wednesday, 8 January 2014 00:44:59 UTC