- From: Michael Smith via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 08 Jul 2009 10:31:05 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/markup/src In directory hutz:/tmp/cvs-serv16128/src Modified Files: datatypes.html syntax.html Log Message: reworked the definitions of different types of "character data" and what element "contents" are, to try to make things more clear; removed "Authors should not" admonitions about particular encodings; restate text about doctype vs. doctype.legacy in terms of document conformance (instead of authoring conformance); streamlined the definition of what a comment is; refined CSS stylesheet to make Notes more clearly identifiable Index: syntax.html =================================================================== RCS file: /sources/public/html5/markup/src/syntax.html,v retrieving revision 1.53 retrieving revision 1.54 diff -u -d -r1.53 -r1.54 --- syntax.html 29 Jun 2009 09:18:50 -0000 1.53 +++ syntax.html 8 Jul 2009 10:31:03 -0000 1.54 @@ -7,11 +7,14 @@ <p>A <dfn id="doctype" title="syntax-doctype">DOCTYPE</dfn> is an special instruction which, for legacy reasons that have to do with processing modes in browsers, is a required part of - any <a href="#syntax-document-html">document in the HTML - syntax</a>.</p> - <p>Except in documents output from certain tools, the DOCTYPE - must match the regular expression in the following pattern - definition.</p> + any + <a href="#syntax-document-html">document in the HTML syntax</a>.</p> + <p>The DOCTYPE must match either the + <a href="#doctype.pattern">doctype</a> + or + <a href="#doctype.legacy">doctype.legacy</a> + patterns defined this specification.</p> + <p>The <code>doctype</code> pattern is defined as follows:</p> <dl class="pattern-def"> <dt><a id="doctype.pattern" href="#doctype.pattern">doctype</a> =</dt> @@ -28,9 +31,7 @@ <pre><!doctype html></pre> <pre><!DOCTYPE HTML></pre> </div> - <p>In documents output from tools that are incapable of - generating a DOCTYPE in the form above, the DOCTYPE must match - the regular expression in the following pattern definition.</p> + <p>The <code>doctype.legacy</code> pattern is defined as follows:</p> <dl class="pattern-def"> <dt><a id="doctype.legacy" href="#doctype.legacy">doctype.legacy</a> =</dt> @@ -49,10 +50,11 @@ <pre><!doctype html public 'about:legacy-compat'></pre> <pre><!DOCTYPE HTML PUBLIC "about:legacy-compat"></pre> </div> - <p>A document must not use a DOCTYPE matching the - <a href="#doctype.legacy">doctype.legacy</a> - pattern unless the document is output from a tool that is - incapable of generating a DOCTYPE matching the + <p>A tool that produces documents that conform to this + specification should not produce documents with a DOCTYPE + matching the <a href="#doctype.legacy">doctype.legacy</a> + pattern unless the tool is incapable of generating a DOCTYPE + matching the <a href="#doctype.pattern">doctype</a> pattern.</p> </section> <section id="character-encoding"> @@ -66,7 +68,6 @@ <ul> <li>The character encoding name given must be the name of the character encoding used to serialize the file.</li> - <li>The value must be a valid character encoding name, and must be the preferred name for that encoding. <a href="#refsIANACHARSET">[IANACHARSET]</a></li> @@ -104,14 +105,17 @@ ANSI_X3.4-1968) for bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A.</p> - <p>Authors should not use JIS_X0212-1990, x-JIS0208, and - encodings based on EBCDIC. Authors should not use UTF-32. - Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU + <p> + <!-- * Documents should not use JIS_X0212-1990, x-JIS0208, --> + <!-- * and encodings based on EBCDIC. --> + <!-- * Documents should not use UTF-32. --> + Documents must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> - <a href="#refsSCSU">[SCSU]</a></p> + <a href="#refsSCSU">[SCSU]</a> + </p> <p>In a <a href="#syntax-document-xml">document the XML syntax</a>, the XML declaration should be used to provide @@ -127,20 +131,22 @@ specification defines the content models for all elements. An element must not contain <a href="#contents">contents</a> or attributes that are not part of its content model.</p> - <p>The <dfn id="contents" title="contents">contents</dfn> of - an element are any elements, <a href="#syntax-text">text</a>, - <a href="#syntax-charref">character references</a>, - <a href="#syntax-cdata-sections">CDATA sections</a>, - or + <p>The + <dfn id="contents" title="contents">contents</dfn> + of an element are any + <a href="#syntax-elements">elements</a>, + <a href="#character-data">character data</a>, + and <a href="#syntax-comments">comments</a> that it contains. Attributes and their values are not considered to be the “contents” of an element.</p> <p>An element whose <a href="#content-model">content model</a> - does not allow it to have <a href="contents" - title="contents">contents</a> is said to be a <dfn - id="void-element" title="void-element">void - element</dfn>. Void elements can have attributes.</p> + does not allow it to have + <a href="#contents">contents</a> + is said to be a + <dfn id="void-element" title="void-element">void element</dfn>. + Void elements can have attributes.</p> <p>The following is a complete list of the void elements in HTML.</p> <dl> @@ -195,16 +201,13 @@ <ol> <li>The first character of a start tag must be a U+003C LESS-THAN SIGN (<code><</code>).</li> - <li>The next few characters of a start tag must be the element's <a href="#tag-name" title="syntax-tag-name">tag name</a>.</li> - <li>If there are to be any attributes in the next step, there must first be one or more <a href="#space" title="space character">space characters</a>.</li> - <li>Then, the start tag may have a number of attributes, the <a href="#attribute" @@ -213,7 +216,6 @@ other by one or more <a href="#space" title="space character">space characters</a>.</li> - <li>After the attributes, the start tag may have one or more <a href="#space" title="space character">space characters</a>. (Some @@ -221,7 +223,6 @@ <a href="#attribute" title="syntax-attributes">attributes section</a> below.)</li> - <li>Start tags must be closed by a U+003E GREATER-THAN SIGN (<code>></code>) character.</li> </ol> @@ -269,41 +270,6 @@ (which again, <a href="#omitted" title="syntax-tag-omission">might be implied in certain cases</a>).</li> - <li>The <a href="#style">style</a> and <a - href="#script">script</a> elements can have - <a href="#syntax-text" >text</a>, though it has - <a href="#text-restrictions">restrictions</a> described in a - later section.</li> - <li>The - <a href="#title">title</a> - and - <a href="#textarea">textarea</a> - elements can have - <a href="#syntax-text">text</a> - and - <a href="#syntax-charref">character references</a>, - but the text must not contain an - <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>. - There are also - <a href="#text-restrictions">further restrictions</a> - described in a later section.</li> - <li>Non-<a href="#void-element">void</a> elements other - than the - <a href="#style">style</a>, - <a href="#script">script</a>, - <a href="#title">title</a>, - and - <a href="#textarea">textarea</a> - elements can contain - <a href="#syntax-text">text</a>, - <a href="#syntax-charref">character references</a>, - other - <a href="#syntax-elements">elements</a>, - and - <a href="#syntax-comments">comments</a>. - But the text must not contain the character U+003C LESS-THAN SIGN - (<code><</code>) or an - <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>.</li> </ul> </section> <section id="syntax-attributes"> @@ -350,10 +316,7 @@ <li> <dfn id="syntax-attribute-value">Attribute values</dfn>, in general, are - <a href="#character-data">character data</a>, - with the additional restriction that they must not - contain any - <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</li> + <a href="#normal-character-data">normal character data</a>.</li> </ul> <p>In the <a href="#html-syntax">the HTML syntax</a>, attributes can be specified in four different ways:</p> @@ -493,68 +456,163 @@ </section> <section id="text-syntax"> <h2>Text and character data</h2> - <p><dfn id="syntax-text" title="syntax-text">Text</dfn> in - <a href="#contents">element contents</a>, - <a href="#syntax-attribute-value">attribute values</a>, - <a href="#syntax-comments">comments</a>, + <p><dfn id="syntax-text" title="syntax-text">Text</dfn> + in + <a href="#contents">element contents</a> + (including in + <a href="#syntax-comments">comments</a>) and - <a href="#syntax-escape">escaping text spans</a> - must consist of Unicode characters and must not contain any of - the following:</p> + <a href="#syntax-attribute-value">attribute values</a> + must consist of Unicode characters, with the following + restrictions:</p> <ul> - <li>U+0000 characters</li> - <li>permanently undefined Unicode characters</li> - <li>control characters other than + <li>must not contain U+0000 characters</li> + <li>must not contain permanently undefined Unicode characters</li> + <li>must not contain control characters other than <a href="#space">space characters</a></li> </ul> - <p><a href="#syntax-text">Text</a> can be combined with - <a href="#syntax-charref">character references</a>, - <a href="#syntax-escape">escaping text spans</a> - in three different ways:</p> - <ul> - <li><dfn id="character-data">character data</dfn> - can contain - <a href="#syntax-text">text</a> - and - <a href="#syntax-charref">character references</a>, - but must not contain - <a href="#syntax-escape">escaping text spans</a></li> - <li><dfn - id="replaceable-character-data" - >replaceable character data</dfn> - can contain - <a href="#syntax-text">text</a>, - <a href="#syntax-charref">character references</a>, - and - <a href="#syntax-escape">escaping text spans</a>, - but must not contain any occurrences of the string - "<code></</code>" (U+003C LESS-THAN SIGN, U+002F - SOLIDUS) followed by characters that case-insensitively - match the tag name of the element followed by one of - U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), - U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN - SIGN (>), or U+002F SOLIDUS (/), unless that string is - part of an - <a href="#syntax-escape">escaping text span</a>.</li> - <li><dfn - id="non-replaceable-character-data" - >non-replaceable character data</dfn> - can contain - <a href="#syntax-text">text</a>, - and - <a href="#syntax-escape">escaping text spans</a> - but must not contain - <a href="#syntax-charref">character references</a>, - and must not contain any occurrences of the string - "<code></</code>" (U+003C LESS-THAN SIGN, U+002F - SOLIDUS) followed by characters that case-insensitively - match the tag name of the element followed by one of - U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), - U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN - SIGN (>), or U+002F SOLIDUS (/), unless that string is - part of an - <a href="#syntax-escape">escaping text span</a>.</li> - </ul> + <p class="note">There is a special type of + <a href="#syntax-text">text</a>, + known as an + <a href="#syntax-escape">escaping text span</a>, + that can occur within certain elements.</p> + <p><dfn + id="character-data" + title="character-data" + >Character data</dfn> contains + <a href="syntax-text">text</a>, in some cases in combination with + <a href="#syntax-charref">character references</a>), + along with certain additional restrictions. There are three + types of character data that can occur in documents:</p> + <ol> + <li><a href="#normal-character-data">normal character data</a></li> + <li><a href="#replaceable-character-data">replaceable character data</a></li> + <li><a href="#non-replaceable-character-data">non-replaceable character data</a></li> + </ol> + <dl id="character-data-types-list"> + <dt><dfn + id="normal-character-data" + title="normal-character-data">Normal character data</dfn></dt> + <dd> + <p>Certain elements and strings in the values of + particular attributes contain normal character data. + Normal character data can contain the following:</p> + <ul> + <li><a href="#syntax-text">text</a></li> + <li><a href="#syntax-charref">character references</a></li> + </ul> + <p>Normal character data has the following restrictions:</p> + <ul> + <li>must not contain any + "<code title="U+003C LESS-THAN SIGN"><</code>" + characters</li> + <li>must not contain any + <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li> + <li>must not contain any + <a href="#syntax-escape">escaping text spans</a></li> + </ul> + </dd> + <dt><dfn + id="replaceable-character-data" + title="replaceable-character-data" + >Replaceable character data</dfn></dt> + <dd> + <p>In + <a href="#syntax-document-html">documents in the HTML syntax</a>, + the + <a href="#title" class="element">title</a> + and + <a href="#textarea" class="element">textarea</a> + elements can contain replaceable character data. + Replaceable character data can contain the following:</p> + <ul> + <li><a href="#syntax-text">text</a>, + optionally including + "<code title="U+003C LESS-THAN SIGN"><</code>" + characters and + <a href="#syntax-escape">escaping text spans</a></li> + <li><a href="#syntax-charref">character references</a></li> + </ul> + <p>Replaceable character data has the following restrictions:</p> + <ul> + <li>must not contain any + <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li> + <li>must not contain any occurrences of the string + "<code></</code>" (U+003C LESS-THAN SIGN, + U+002F SOLIDUS) followed by characters that + case-insensitively match the tag name of the + element containing the replaceable character data + (for example, "<code></title</code>" or + "<code></textarea</code>"), + followed by one of U+0009 CHARACTER TABULATION, + U+000A LINE FEED (LF), U+000C FORM FEED (FF), + U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or + U+002F SOLIDUS (/), unless that string is part of + an + <a href="#syntax-escape">escaping text span</a>.</li> + </ul> + <p class="note">Replaceable character data, + as defined in this specification, is a feature of + <a href="#html-syntax">the HTML syntax</a> + that is not available in + <a href="#xml-syntax">the XML syntax</a>. + <a href="#syntax-document-xml">Documents in the XML + syntax</a> must not contain replaceable character data + as defined in this specification; instead they must + conform to all syntax constraints defined in the XML + specification <a href="#refsXML">[XML]</a>.</p> + </dd> + <dt><dfn + id="non-replaceable-character-data" + title="non-replaceable-character-data" + >Non-replaceable character data</dfn></dt> + <dd> + <p>In + <a href="#syntax-document-html">documents in the HTML syntax</a>, + the + <a href="#script" class="element">script</a> + and + <a href="#style" class="element">style</a> + elements can contain non-replaceable character data. + Non-replaceable character data can contain the + following:</p> + <ul> + <li><a href="#syntax-text">text</a>, + optionally including + "<code title="U+003C LESS-THAN SIGN"><</code>" + characters and + <a href="#syntax-escape">escaping text spans</a></li> + <li><a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li> + </ul> + <p>Non-replaceable character data has the following restrictions:</p> + <ul> + <li>must not contain <a href="#syntax-charref">character references</a></li> + <li>must not contain any occurrences of the string + "<code></</code>" (U+003C LESS-THAN SIGN, + U+002F SOLIDUS) followed by characters that + case-insensitively match the tag name of the + element containing the replaceable character data + (for example, "<code></script</code>" or + "<code></style</code>"), + followed by one of U+0009 CHARACTER TABULATION, + U+000A LINE FEED (LF), U+000C FORM FEED (FF), + U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or + U+002F SOLIDUS (/), unless that string is part of + an + <a href="#syntax-escape">escaping text span</a>.</li> + </ul> + <p class="note">Non-replaceable character data, + as defined in this specification, is a feature of + <a href="#html-syntax">the HTML syntax</a> + that is not available in + <a href="#xml-syntax">the XML syntax</a>. + <a href="#syntax-document-xml">Documents in the XML + syntax</a> must not contain non-replaceable character + data as defined in this specification; instead they must + conform to all syntax constraints defined in the XML + specification <a href="#refsXML">[XML]</a>.</p> + </dd> + </dl> </section> <section id="character-references"> <h2>Character references</h2> @@ -571,7 +629,7 @@ href="#refsEntities">[Entities]</a>, using the same case, terminated by a U+003B SEMICOLON (<code title="">;</code>) character.</dd> - <dt><dfn id="dec-charref">Decimal numeric character reference.</dfn></dt> + <dt><dfn id="dec-charref">Decimal numeric character reference</dfn></dt> <dd>The ampersand must be followed by a U+0023 NUMBER SIGN (<code>#</code>) character, followed by one or more digits in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing @@ -608,21 +666,41 @@ <h2>Comments</h2> <p> <dfn id="syntax-comments" title="syntax-comments">Comments</dfn> - must start with the four character sequence U+003C LESS-THAN - SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D - HYPHEN-MINUS (<code title=""><!--</code>). Following that - sequence, the comment may have - <a href="#syntax-text" title="syntax-text">text</a>, with the - additional restriction that the text must not start with a - single U+003E GREATER-THAN SIGN ('>') character, nor start - with a U+002D HYPHEN-MINUS (<code title="">-</code>) character - followed by a U+003E GREATER-THAN SIGN ('>') character, nor - contain two consecutive U+002D HYPHEN-MINUS - (<code title="">-</code>) characters, nor end with a U+002D - HYPHEN-MINUS (<code title="">-</code>) character. Finally, the - comment must be ended by the three character sequence U+002D - HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN - (<code title="">--></code>).</p> + consist of the following three parts, in exactly the following + order:</p> + <ol> + <li>the string + "<code + title="U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS" + ><!--</code>" + </li> + <li><a href="#syntax-text">text</a></li> + <li>the string + "<code + title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN" + >--></code>" + </li> + </ol> + <p>The <a href="#syntax-text">text</a> + part of comments has the following restrictions:</p> + <ul> + <li>must not start with a + "<code + title="U+003E GREATER-THAN SIGN" + >></code>" character</li> + <li>must not start with the string + "<code + title="U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN" + >-></code>"</li> + <li>must not contain the string + "<code + title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS" + >--</code>"</li> + <li>must not end with a + "<code + title="U+002D HYPHEN-MINUS" + >-</code>" character</li> + </ul> <div class="example"> <p>The following is an example of a comment.</p> <pre><!-- main content starts here --></pre> Index: datatypes.html =================================================================== RCS file: /sources/public/html5/markup/src/datatypes.html,v retrieving revision 1.26 retrieving revision 1.27 diff -u -d -r1.26 -r1.27 --- datatypes.html 29 Jun 2009 09:18:49 -0000 1.26 +++ datatypes.html 8 Jul 2009 10:31:03 -0000 1.27 @@ -5,7 +5,8 @@ <p>For any pattern in this document that references the <a href="#data-string">string</a> datatype, a <dfn id="data-string" title="string">string</dfn> - is defined as <a href="#character-data">character data</a> + is defined as + <a href="#normal-character-data">normal character data</a> that does not contain any <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</p> <p>The <a href="#syntax-attributes">Attributes</a> section of
Received on Wednesday, 8 July 2009 10:31:17 UTC