- From: Michael Smith via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 08 Jul 2009 10:31:05 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/markup/src
In directory hutz:/tmp/cvs-serv16128/src
Modified Files:
datatypes.html syntax.html
Log Message:
reworked the definitions of different types of "character data" and what element "contents" are, to try to make things more clear; removed "Authors should not" admonitions about particular encodings; restate text about doctype vs. doctype.legacy in terms of document conformance (instead of authoring conformance); streamlined the definition of what a comment is; refined CSS stylesheet to make Notes more clearly identifiable
Index: syntax.html
===================================================================
RCS file: /sources/public/html5/markup/src/syntax.html,v
retrieving revision 1.53
retrieving revision 1.54
diff -u -d -r1.53 -r1.54
--- syntax.html 29 Jun 2009 09:18:50 -0000 1.53
+++ syntax.html 8 Jul 2009 10:31:03 -0000 1.54
@@ -7,11 +7,14 @@
<p>A <dfn id="doctype" title="syntax-doctype">DOCTYPE</dfn> is
an special instruction which, for legacy reasons that have to
do with processing modes in browsers, is a required part of
- any <a href="#syntax-document-html">document in the HTML
- syntax</a>.</p>
- <p>Except in documents output from certain tools, the DOCTYPE
- must match the regular expression in the following pattern
- definition.</p>
+ any
+ <a href="#syntax-document-html">document in the HTML syntax</a>.</p>
+ <p>The DOCTYPE must match either the
+ <a href="#doctype.pattern">doctype</a>
+ or
+ <a href="#doctype.legacy">doctype.legacy</a>
+ patterns defined this specification.</p>
+ <p>The <code>doctype</code> pattern is defined as follows:</p>
<dl class="pattern-def">
<dt><a id="doctype.pattern"
href="#doctype.pattern">doctype</a> =</dt>
@@ -28,9 +31,7 @@
<pre><!doctype html></pre>
<pre><!DOCTYPE HTML></pre>
</div>
- <p>In documents output from tools that are incapable of
- generating a DOCTYPE in the form above, the DOCTYPE must match
- the regular expression in the following pattern definition.</p>
+ <p>The <code>doctype.legacy</code> pattern is defined as follows:</p>
<dl class="pattern-def">
<dt><a id="doctype.legacy"
href="#doctype.legacy">doctype.legacy</a> =</dt>
@@ -49,10 +50,11 @@
<pre><!doctype html public 'about:legacy-compat'></pre>
<pre><!DOCTYPE HTML PUBLIC "about:legacy-compat"></pre>
</div>
- <p>A document must not use a DOCTYPE matching the
- <a href="#doctype.legacy">doctype.legacy</a>
- pattern unless the document is output from a tool that is
- incapable of generating a DOCTYPE matching the
+ <p>A tool that produces documents that conform to this
+ specification should not produce documents with a DOCTYPE
+ matching the <a href="#doctype.legacy">doctype.legacy</a>
+ pattern unless the tool is incapable of generating a DOCTYPE
+ matching the
<a href="#doctype.pattern">doctype</a> pattern.</p>
</section>
<section id="character-encoding">
@@ -66,7 +68,6 @@
<ul>
<li>The character encoding name given must be the name of
the character encoding used to serialize the file.</li>
-
<li>The value must be a valid character encoding name, and
must be the preferred name for that encoding.
<a href="#refsIANACHARSET">[IANACHARSET]</a></li>
@@ -104,14 +105,17 @@
ANSI_X3.4-1968) for bytes in the set 0x09, 0x0A, 0x0C, 0x0D,
0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 -
0x7A.</p>
- <p>Authors should not use JIS_X0212-1990, x-JIS0208, and
- encodings based on EBCDIC. Authors should not use UTF-32.
- Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
+ <p>
+ <!-- * Documents should not use JIS_X0212-1990, x-JIS0208, -->
+ <!-- * and encodings based on EBCDIC. -->
+ <!-- * Documents should not use UTF-32. -->
+ Documents must not use the CESU-8, UTF-7, BOCU-1 and SCSU
encodings.
<a href="#refsCESU8">[CESU8]</a>
<a href="#refsUTF7">[UTF7]</a>
<a href="#refsBOCU1">[BOCU1]</a>
- <a href="#refsSCSU">[SCSU]</a></p>
+ <a href="#refsSCSU">[SCSU]</a>
+ </p>
<p>In a
<a href="#syntax-document-xml">document the XML syntax</a>,
the XML declaration should be used to provide
@@ -127,20 +131,22 @@
specification defines the content models for all elements.
An element must not contain <a href="#contents">contents</a>
or attributes that are not part of its content model.</p>
- <p>The <dfn id="contents" title="contents">contents</dfn> of
- an element are any elements, <a href="#syntax-text">text</a>,
- <a href="#syntax-charref">character references</a>,
- <a href="#syntax-cdata-sections">CDATA sections</a>,
- or
+ <p>The
+ <dfn id="contents" title="contents">contents</dfn>
+ of an element are any
+ <a href="#syntax-elements">elements</a>,
+ <a href="#character-data">character data</a>,
+ and
<a href="#syntax-comments">comments</a>
that it contains.
Attributes and their values are not considered to be the
“contents” of an element.</p>
<p>An element whose <a href="#content-model">content model</a>
- does not allow it to have <a href="contents"
- title="contents">contents</a> is said to be a <dfn
- id="void-element" title="void-element">void
- element</dfn>. Void elements can have attributes.</p>
+ does not allow it to have
+ <a href="#contents">contents</a>
+ is said to be a
+ <dfn id="void-element" title="void-element">void element</dfn>.
+ Void elements can have attributes.</p>
<p>The following is a complete list of the void elements in
HTML.</p>
<dl>
@@ -195,16 +201,13 @@
<ol>
<li>The first character of a start tag must be a U+003C
LESS-THAN SIGN (<code><</code>).</li>
-
<li>The next
few characters of a start tag must be the element's
<a href="#tag-name" title="syntax-tag-name">tag name</a>.</li>
-
<li>If there are to be any attributes in the next step,
there must first be one or more
<a href="#space"
title="space character">space characters</a>.</li>
-
<li>Then, the start tag may have
a number of attributes, the
<a href="#attribute"
@@ -213,7 +216,6 @@
other by one or more
<a href="#space"
title="space character">space characters</a>.</li>
-
<li>After the attributes, the start tag may have one or more
<a href="#space"
title="space character">space characters</a>. (Some
@@ -221,7 +223,6 @@
<a href="#attribute"
title="syntax-attributes">attributes section</a>
below.)</li>
-
<li>Start tags must be closed by a U+003E GREATER-THAN
SIGN (<code>></code>) character.</li>
</ol>
@@ -269,41 +270,6 @@
(which again,
<a href="#omitted" title="syntax-tag-omission">might be
implied in certain cases</a>).</li>
- <li>The <a href="#style">style</a> and <a
- href="#script">script</a> elements can have
- <a href="#syntax-text" >text</a>, though it has
- <a href="#text-restrictions">restrictions</a> described in a
- later section.</li>
- <li>The
- <a href="#title">title</a>
- and
- <a href="#textarea">textarea</a>
- elements can have
- <a href="#syntax-text">text</a>
- and
- <a href="#syntax-charref">character references</a>,
- but the text must not contain an
- <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>.
- There are also
- <a href="#text-restrictions">further restrictions</a>
- described in a later section.</li>
- <li>Non-<a href="#void-element">void</a> elements other
- than the
- <a href="#style">style</a>,
- <a href="#script">script</a>,
- <a href="#title">title</a>,
- and
- <a href="#textarea">textarea</a>
- elements can contain
- <a href="#syntax-text">text</a>,
- <a href="#syntax-charref">character references</a>,
- other
- <a href="#syntax-elements">elements</a>,
- and
- <a href="#syntax-comments">comments</a>.
- But the text must not contain the character U+003C LESS-THAN SIGN
- (<code><</code>) or an
- <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>.</li>
</ul>
</section>
<section id="syntax-attributes">
@@ -350,10 +316,7 @@
<li>
<dfn id="syntax-attribute-value">Attribute values</dfn>, in
general, are
- <a href="#character-data">character data</a>,
- with the additional restriction that they must not
- contain any
- <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</li>
+ <a href="#normal-character-data">normal character data</a>.</li>
</ul>
<p>In the <a href="#html-syntax">the HTML syntax</a>,
attributes can be specified in four different ways:</p>
@@ -493,68 +456,163 @@
</section>
<section id="text-syntax">
<h2>Text and character data</h2>
- <p><dfn id="syntax-text" title="syntax-text">Text</dfn> in
- <a href="#contents">element contents</a>,
- <a href="#syntax-attribute-value">attribute values</a>,
- <a href="#syntax-comments">comments</a>,
+ <p><dfn id="syntax-text" title="syntax-text">Text</dfn>
+ in
+ <a href="#contents">element contents</a>
+ (including in
+ <a href="#syntax-comments">comments</a>)
and
- <a href="#syntax-escape">escaping text spans</a>
- must consist of Unicode characters and must not contain any of
- the following:</p>
+ <a href="#syntax-attribute-value">attribute values</a>
+ must consist of Unicode characters, with the following
+ restrictions:</p>
<ul>
- <li>U+0000 characters</li>
- <li>permanently undefined Unicode characters</li>
- <li>control characters other than
+ <li>must not contain U+0000 characters</li>
+ <li>must not contain permanently undefined Unicode characters</li>
+ <li>must not contain control characters other than
<a href="#space">space characters</a></li>
</ul>
- <p><a href="#syntax-text">Text</a> can be combined with
- <a href="#syntax-charref">character references</a>,
- <a href="#syntax-escape">escaping text spans</a>
- in three different ways:</p>
- <ul>
- <li><dfn id="character-data">character data</dfn>
- can contain
- <a href="#syntax-text">text</a>
- and
- <a href="#syntax-charref">character references</a>,
- but must not contain
- <a href="#syntax-escape">escaping text spans</a></li>
- <li><dfn
- id="replaceable-character-data"
- >replaceable character data</dfn>
- can contain
- <a href="#syntax-text">text</a>,
- <a href="#syntax-charref">character references</a>,
- and
- <a href="#syntax-escape">escaping text spans</a>,
- but must not contain any occurrences of the string
- "<code></</code>" (U+003C LESS-THAN SIGN, U+002F
- SOLIDUS) followed by characters that case-insensitively
- match the tag name of the element followed by one of
- U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
- U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN
- SIGN (>), or U+002F SOLIDUS (/), unless that string is
- part of an
- <a href="#syntax-escape">escaping text span</a>.</li>
- <li><dfn
- id="non-replaceable-character-data"
- >non-replaceable character data</dfn>
- can contain
- <a href="#syntax-text">text</a>,
- and
- <a href="#syntax-escape">escaping text spans</a>
- but must not contain
- <a href="#syntax-charref">character references</a>,
- and must not contain any occurrences of the string
- "<code></</code>" (U+003C LESS-THAN SIGN, U+002F
- SOLIDUS) followed by characters that case-insensitively
- match the tag name of the element followed by one of
- U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
- U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN
- SIGN (>), or U+002F SOLIDUS (/), unless that string is
- part of an
- <a href="#syntax-escape">escaping text span</a>.</li>
- </ul>
+ <p class="note">There is a special type of
+ <a href="#syntax-text">text</a>,
+ known as an
+ <a href="#syntax-escape">escaping text span</a>,
+ that can occur within certain elements.</p>
+ <p><dfn
+ id="character-data"
+ title="character-data"
+ >Character data</dfn> contains
+ <a href="syntax-text">text</a>, in some cases in combination with
+ <a href="#syntax-charref">character references</a>),
+ along with certain additional restrictions. There are three
+ types of character data that can occur in documents:</p>
+ <ol>
+ <li><a href="#normal-character-data">normal character data</a></li>
+ <li><a href="#replaceable-character-data">replaceable character data</a></li>
+ <li><a href="#non-replaceable-character-data">non-replaceable character data</a></li>
+ </ol>
+ <dl id="character-data-types-list">
+ <dt><dfn
+ id="normal-character-data"
+ title="normal-character-data">Normal character data</dfn></dt>
+ <dd>
+ <p>Certain elements and strings in the values of
+ particular attributes contain normal character data.
+ Normal character data can contain the following:</p>
+ <ul>
+ <li><a href="#syntax-text">text</a></li>
+ <li><a href="#syntax-charref">character references</a></li>
+ </ul>
+ <p>Normal character data has the following restrictions:</p>
+ <ul>
+ <li>must not contain any
+ "<code title="U+003C LESS-THAN SIGN"><</code>"
+ characters</li>
+ <li>must not contain any
+ <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+ <li>must not contain any
+ <a href="#syntax-escape">escaping text spans</a></li>
+ </ul>
+ </dd>
+ <dt><dfn
+ id="replaceable-character-data"
+ title="replaceable-character-data"
+ >Replaceable character data</dfn></dt>
+ <dd>
+ <p>In
+ <a href="#syntax-document-html">documents in the HTML syntax</a>,
+ the
+ <a href="#title" class="element">title</a>
+ and
+ <a href="#textarea" class="element">textarea</a>
+ elements can contain replaceable character data.
+ Replaceable character data can contain the following:</p>
+ <ul>
+ <li><a href="#syntax-text">text</a>,
+ optionally including
+ "<code title="U+003C LESS-THAN SIGN"><</code>"
+ characters and
+ <a href="#syntax-escape">escaping text spans</a></li>
+ <li><a href="#syntax-charref">character references</a></li>
+ </ul>
+ <p>Replaceable character data has the following restrictions:</p>
+ <ul>
+ <li>must not contain any
+ <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+ <li>must not contain any occurrences of the string
+ "<code></</code>" (U+003C LESS-THAN SIGN,
+ U+002F SOLIDUS) followed by characters that
+ case-insensitively match the tag name of the
+ element containing the replaceable character data
+ (for example, "<code></title</code>" or
+ "<code></textarea</code>"),
+ followed by one of U+0009 CHARACTER TABULATION,
+ U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or
+ U+002F SOLIDUS (/), unless that string is part of
+ an
+ <a href="#syntax-escape">escaping text span</a>.</li>
+ </ul>
+ <p class="note">Replaceable character data,
+ as defined in this specification, is a feature of
+ <a href="#html-syntax">the HTML syntax</a>
+ that is not available in
+ <a href="#xml-syntax">the XML syntax</a>.
+ <a href="#syntax-document-xml">Documents in the XML
+ syntax</a> must not contain replaceable character data
+ as defined in this specification; instead they must
+ conform to all syntax constraints defined in the XML
+ specification <a href="#refsXML">[XML]</a>.</p>
+ </dd>
+ <dt><dfn
+ id="non-replaceable-character-data"
+ title="non-replaceable-character-data"
+ >Non-replaceable character data</dfn></dt>
+ <dd>
+ <p>In
+ <a href="#syntax-document-html">documents in the HTML syntax</a>,
+ the
+ <a href="#script" class="element">script</a>
+ and
+ <a href="#style" class="element">style</a>
+ elements can contain non-replaceable character data.
+ Non-replaceable character data can contain the
+ following:</p>
+ <ul>
+ <li><a href="#syntax-text">text</a>,
+ optionally including
+ "<code title="U+003C LESS-THAN SIGN"><</code>"
+ characters and
+ <a href="#syntax-escape">escaping text spans</a></li>
+ <li><a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+ </ul>
+ <p>Non-replaceable character data has the following restrictions:</p>
+ <ul>
+ <li>must not contain <a href="#syntax-charref">character references</a></li>
+ <li>must not contain any occurrences of the string
+ "<code></</code>" (U+003C LESS-THAN SIGN,
+ U+002F SOLIDUS) followed by characters that
+ case-insensitively match the tag name of the
+ element containing the replaceable character data
+ (for example, "<code></script</code>" or
+ "<code></style</code>"),
+ followed by one of U+0009 CHARACTER TABULATION,
+ U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or
+ U+002F SOLIDUS (/), unless that string is part of
+ an
+ <a href="#syntax-escape">escaping text span</a>.</li>
+ </ul>
+ <p class="note">Non-replaceable character data,
+ as defined in this specification, is a feature of
+ <a href="#html-syntax">the HTML syntax</a>
+ that is not available in
+ <a href="#xml-syntax">the XML syntax</a>.
+ <a href="#syntax-document-xml">Documents in the XML
+ syntax</a> must not contain non-replaceable character
+ data as defined in this specification; instead they must
+ conform to all syntax constraints defined in the XML
+ specification <a href="#refsXML">[XML]</a>.</p>
+ </dd>
+ </dl>
</section>
<section id="character-references">
<h2>Character references</h2>
@@ -571,7 +629,7 @@
href="#refsEntities">[Entities]</a>, using the same
case, terminated by a U+003B SEMICOLON (<code
title="">;</code>) character.</dd>
- <dt><dfn id="dec-charref">Decimal numeric character reference.</dfn></dt>
+ <dt><dfn id="dec-charref">Decimal numeric character reference</dfn></dt>
<dd>The ampersand must be followed by a U+0023 NUMBER SIGN
(<code>#</code>) character, followed by one or more digits in
the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing
@@ -608,21 +666,41 @@
<h2>Comments</h2>
<p>
<dfn id="syntax-comments" title="syntax-comments">Comments</dfn>
- must start with the four character sequence U+003C LESS-THAN
- SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
- HYPHEN-MINUS (<code title=""><!--</code>). Following that
- sequence, the comment may have
- <a href="#syntax-text" title="syntax-text">text</a>, with the
- additional restriction that the text must not start with a
- single U+003E GREATER-THAN SIGN ('>') character, nor start
- with a U+002D HYPHEN-MINUS (<code title="">-</code>) character
- followed by a U+003E GREATER-THAN SIGN ('>') character, nor
- contain two consecutive U+002D HYPHEN-MINUS
- (<code title="">-</code>) characters, nor end with a U+002D
- HYPHEN-MINUS (<code title="">-</code>) character. Finally, the
- comment must be ended by the three character sequence U+002D
- HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN
- (<code title="">--></code>).</p>
+ consist of the following three parts, in exactly the following
+ order:</p>
+ <ol>
+ <li>the string
+ "<code
+ title="U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS"
+ ><!--</code>"
+ </li>
+ <li><a href="#syntax-text">text</a></li>
+ <li>the string
+ "<code
+ title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN"
+ >--></code>"
+ </li>
+ </ol>
+ <p>The <a href="#syntax-text">text</a>
+ part of comments has the following restrictions:</p>
+ <ul>
+ <li>must not start with a
+ "<code
+ title="U+003E GREATER-THAN SIGN"
+ >></code>" character</li>
+ <li>must not start with the string
+ "<code
+ title="U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN"
+ >-></code>"</li>
+ <li>must not contain the string
+ "<code
+ title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS"
+ >--</code>"</li>
+ <li>must not end with a
+ "<code
+ title="U+002D HYPHEN-MINUS"
+ >-</code>" character</li>
+ </ul>
<div class="example">
<p>The following is an example of a comment.</p>
<pre><!-- main content starts here --></pre>
Index: datatypes.html
===================================================================
RCS file: /sources/public/html5/markup/src/datatypes.html,v
retrieving revision 1.26
retrieving revision 1.27
diff -u -d -r1.26 -r1.27
--- datatypes.html 29 Jun 2009 09:18:49 -0000 1.26
+++ datatypes.html 8 Jul 2009 10:31:03 -0000 1.27
@@ -5,7 +5,8 @@
<p>For any pattern in this document that references the <a
href="#data-string">string</a> datatype, a
<dfn id="data-string" title="string">string</dfn>
- is defined as <a href="#character-data">character data</a>
+ is defined as
+ <a href="#normal-character-data">normal character data</a>
that does not contain any
<a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</p>
<p>The <a href="#syntax-attributes">Attributes</a> section of
Received on Wednesday, 8 July 2009 10:31:17 UTC