W3C home > Mailing lists > Public > public-html-commits@w3.org > July 2009

html5/markup/src datatypes.html,1.26,1.27 syntax.html,1.53,1.54

From: Michael Smith via cvs-syncmail <cvsmail@w3.org>
Date: Wed, 08 Jul 2009 10:31:05 +0000
To: public-html-commits@w3.org
Message-Id: <E1MOUQb-0004O9-Iu@lionel-hutz.w3.org>
Update of /sources/public/html5/markup/src
In directory hutz:/tmp/cvs-serv16128/src

Modified Files:
	datatypes.html syntax.html 
Log Message:
reworked the definitions of different types of "character data" and what element "contents" are, to try to make things more clear; removed "Authors should not" admonitions about particular encodings; restate text about doctype vs. doctype.legacy in terms of document conformance (instead of authoring conformance); streamlined the definition of what a comment is; refined CSS stylesheet to make Notes more clearly identifiable


Index: syntax.html
===================================================================
RCS file: /sources/public/html5/markup/src/syntax.html,v
retrieving revision 1.53
retrieving revision 1.54
diff -u -d -r1.53 -r1.54
--- syntax.html	29 Jun 2009 09:18:50 -0000	1.53
+++ syntax.html	8 Jul 2009 10:31:03 -0000	1.54
@@ -7,11 +7,14 @@
     <p>A <dfn id="doctype" title="syntax-doctype">DOCTYPE</dfn> is
     an special instruction which, for legacy reasons that have to
     do with processing modes in browsers, is a required part of
-    any <a href="#syntax-document-html">document in the HTML
-      syntax</a>.</p>
-    <p>Except in documents output from certain tools, the DOCTYPE
-    must match the regular expression in the following pattern
-    definition.</p>
+    any
+    <a href="#syntax-document-html">document in the HTML syntax</a>.</p>
+    <p>The DOCTYPE must match either the
+    <a href="#doctype.pattern">doctype</a>
+    or
+    <a href="#doctype.legacy">doctype.legacy</a>
+    patterns defined this specification.</p>
+    <p>The <code>doctype</code> pattern is defined as follows:</p>
     <dl class="pattern-def">
       <dt><a id="doctype.pattern"
         href="#doctype.pattern">doctype</a> =</dt>
@@ -28,9 +31,7 @@
     <pre>&lt;!doctype html></pre>
     <pre>&lt;!DOCTYPE HTML></pre>
     </div>
-    <p>In documents output from tools that are incapable of
-    generating a DOCTYPE in the form above, the DOCTYPE must match
-    the regular expression in the following pattern definition.</p>
+    <p>The <code>doctype.legacy</code> pattern is defined as follows:</p>
     <dl class="pattern-def">
       <dt><a id="doctype.legacy"
         href="#doctype.legacy">doctype.legacy</a> =</dt>
@@ -49,10 +50,11 @@
     <pre>&lt;!doctype html public 'about:legacy-compat'></pre>
     <pre>&lt;!DOCTYPE HTML PUBLIC "about:legacy-compat"></pre>
     </div>
-    <p>A document must not use a DOCTYPE matching the
-    <a href="#doctype.legacy">doctype.legacy</a>
-    pattern unless the document is output from a tool that is
-    incapable of generating a DOCTYPE matching the
+    <p>A tool that produces documents that conform to this
+    specification should not produce documents with a DOCTYPE
+    matching the <a href="#doctype.legacy">doctype.legacy</a>
+    pattern unless the tool is incapable of generating a DOCTYPE
+    matching the
     <a href="#doctype.pattern">doctype</a> pattern.</p>
   </section>
   <section id="character-encoding">
@@ -66,7 +68,6 @@
     <ul>
       <li>The character encoding name given must be the name of
       the character encoding used to serialize the file.</li>
-
       <li>The value must be a valid character encoding name, and
       must be the preferred name for that encoding.
       <a href="#refsIANACHARSET">[IANACHARSET]</a></li>
@@ -104,14 +105,17 @@
     ANSI_X3.4-1968) for bytes in the set 0x09, 0x0A, 0x0C, 0x0D,
     0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 -
     0x7A.</p>
-    <p>Authors should not use JIS_X0212-1990, x-JIS0208, and
-    encodings based on EBCDIC. Authors should not use UTF-32.
-    Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
+    <p>
+    <!-- * Documents should not use JIS_X0212-1990, x-JIS0208, -->
+    <!-- * and encodings based on EBCDIC. -->
+    <!-- * Documents should not use UTF-32. -->
+    Documents must not use the CESU-8, UTF-7, BOCU-1 and SCSU
     encodings.
     <a href="#refsCESU8">[CESU8]</a>
     <a href="#refsUTF7">[UTF7]</a>
     <a href="#refsBOCU1">[BOCU1]</a>
-    <a href="#refsSCSU">[SCSU]</a></p>
+    <a href="#refsSCSU">[SCSU]</a>
+    </p>
     <p>In a
     <a href="#syntax-document-xml">document the XML syntax</a>,
     the XML declaration should be used to provide
@@ -127,20 +131,22 @@
     specification defines the content models for all elements.
     An element must not contain <a href="#contents">contents</a>
     or attributes that are not part of its content model.</p>
-    <p>The <dfn id="contents" title="contents">contents</dfn> of
-    an element are any elements, <a href="#syntax-text">text</a>,
-    <a href="#syntax-charref">character references</a>,
-    <a href="#syntax-cdata-sections">CDATA sections</a>,
-    or
+    <p>The
+    <dfn id="contents" title="contents">contents</dfn>
+    of an element are any
+    <a href="#syntax-elements">elements</a>,
+    <a href="#character-data">character data</a>,
+    and
     <a href="#syntax-comments">comments</a>
     that it contains.
     Attributes and their values are not considered to be the
     “contents” of an element.</p>
     <p>An element whose <a href="#content-model">content model</a>
-    does not allow it to have <a href="contents"
-      title="contents">contents</a> is said to be a <dfn
-      id="void-element" title="void-element">void
-      element</dfn>. Void elements can have attributes.</p>
+    does not allow it to have
+    <a href="#contents">contents</a>
+    is said to be a
+    <dfn id="void-element" title="void-element">void element</dfn>.
+    Void elements can have attributes.</p>
     <p>The following is a complete list of the void elements in
     HTML.</p>
     <dl>
@@ -195,16 +201,13 @@
       <ol>
         <li>The first character of a start tag must be a U+003C
         LESS-THAN SIGN (<code>&lt;</code>).</li>
-
         <li>The next
         few characters of a start tag must be the element's
         <a href="#tag-name" title="syntax-tag-name">tag name</a>.</li>
-
         <li>If there are to be any attributes in the next step,
         there must first be one or more
         <a href="#space"
           title="space character">space characters</a>.</li>
-
         <li>Then, the start tag may have
         a number of attributes, the
         <a href="#attribute"
@@ -213,7 +216,6 @@
         other by one or more
         <a href="#space"
           title="space character">space characters</a>.</li>
-
         <li>After the attributes, the start tag may have one or more
         <a href="#space"
           title="space character">space characters</a>. (Some
@@ -221,7 +223,6 @@
         <a href="#attribute"
           title="syntax-attributes">attributes section</a>
         below.)</li>
-
         <li>Start tags must be closed by a U+003E GREATER-THAN
         SIGN (<code>&gt;</code>) character.</li>
       </ol>
@@ -269,41 +270,6 @@
     (which again,
     <a href="#omitted" title="syntax-tag-omission">might be
       implied in certain cases</a>).</li>
-    <li>The <a href="#style">style</a> and <a
-        href="#script">script</a> elements can have
-      <a href="#syntax-text" >text</a>, though it has
-      <a href="#text-restrictions">restrictions</a> described in a
-      later section.</li>
-    <li>The 
-      <a href="#title">title</a>
-      and
-      <a href="#textarea">textarea</a>
-      elements can have
-      <a href="#syntax-text">text</a>
-      and
-      <a href="#syntax-charref">character references</a>,
-      but the text must not contain an
-      <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>.
-    There are also
-    <a href="#text-restrictions">further restrictions</a>
-    described in a later section.</li>
-  <li>Non-<a href="#void-element">void</a> elements other
-    than the
-    <a href="#style">style</a>,
-    <a href="#script">script</a>,
-    <a href="#title">title</a>,
-    and
-    <a href="#textarea">textarea</a>
-    elements can contain
-    <a href="#syntax-text">text</a>,
-    <a href="#syntax-charref">character references</a>,
-    other
-    <a href="#syntax-elements">elements</a>,
-    and
-    <a href="#syntax-comments">comments</a>.
-    But the text must not contain the character U+003C LESS-THAN SIGN
-    (<code>&lt;</code>) or an
-    <a href="#syntax-ambiguous-ampersand">ambiguous ampersand</a>.</li>
     </ul>
   </section>
     <section id="syntax-attributes">
@@ -350,10 +316,7 @@
       <li>
       <dfn id="syntax-attribute-value">Attribute values</dfn>, in
       general, are
-      <a href="#character-data">character data</a>,
-      with the additional restriction that they must not
-      contain any
-      <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</li>
+      <a href="#normal-character-data">normal character data</a>.</li>
     </ul>
     <p>In the <a href="#html-syntax">the HTML syntax</a>,
     attributes can be specified in four different ways:</p>
@@ -493,68 +456,163 @@
     </section>
   <section id="text-syntax">
     <h2>Text and character data</h2>
-    <p><dfn id="syntax-text" title="syntax-text">Text</dfn> in
-      <a href="#contents">element contents</a>,
-      <a href="#syntax-attribute-value">attribute values</a>,
-      <a href="#syntax-comments">comments</a>,
+    <p><dfn id="syntax-text" title="syntax-text">Text</dfn>
+    in
+      <a href="#contents">element contents</a>
+      (including in
+      <a href="#syntax-comments">comments</a>)
       and
-      <a href="#syntax-escape">escaping text spans</a>
-    must consist of Unicode characters and must not contain any of
-    the following:</p>
+      <a href="#syntax-attribute-value">attribute values</a>
+      must consist of Unicode characters, with the following
+      restrictions:</p>
     <ul>
-      <li>U+0000 characters</li>
-      <li>permanently undefined Unicode characters</li>
-      <li>control characters other than
+      <li>must not contain U+0000 characters</li>
+      <li>must not contain permanently undefined Unicode characters</li>
+      <li>must not contain control characters other than
         <a href="#space">space characters</a></li>
     </ul>
-    <p><a href="#syntax-text">Text</a> can be combined with
-      <a href="#syntax-charref">character references</a>,
-      <a href="#syntax-escape">escaping text spans</a>
-      in three different ways:</p>
-    <ul>
-      <li><dfn id="character-data">character data</dfn>
-      can contain
-      <a href="#syntax-text">text</a>
-      and
-      <a href="#syntax-charref">character references</a>,
-      but must not contain
-      <a href="#syntax-escape">escaping text spans</a></li>
-      <li><dfn
-        id="replaceable-character-data"
-        >replaceable character data</dfn>
-      can contain
-      <a href="#syntax-text">text</a>,
-      <a href="#syntax-charref">character references</a>,
-      and 
-      <a href="#syntax-escape">escaping text spans</a>,
-      but must not contain any occurrences of the string
-      "<code>&lt;/</code>" (U+003C LESS-THAN SIGN, U+002F
-      SOLIDUS) followed by characters that case-insensitively
-      match the tag name of the element followed by one of
-      U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
-      U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN
-      SIGN (&gt;), or U+002F SOLIDUS (/), unless that string is
-      part of an
-      <a href="#syntax-escape">escaping text span</a>.</li>
-      <li><dfn
-        id="non-replaceable-character-data"
-        >non-replaceable character data</dfn>
-      can contain
-      <a href="#syntax-text">text</a>,
-      and 
-      <a href="#syntax-escape">escaping text spans</a>
-      but must not contain
-      <a href="#syntax-charref">character references</a>,
-      and must not contain any occurrences of the string
-      "<code>&lt;/</code>" (U+003C LESS-THAN SIGN, U+002F
-      SOLIDUS) followed by characters that case-insensitively
-      match the tag name of the element followed by one of
-      U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
-      U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN
-      SIGN (&gt;), or U+002F SOLIDUS (/), unless that string is
-      part of an
-      <a href="#syntax-escape">escaping text span</a>.</li>
-    </ul>
+    <p class="note">There is a special type of
+    <a href="#syntax-text">text</a>,
+    known as an
+    <a href="#syntax-escape">escaping text span</a>,
+    that can occur within certain elements.</p>
+    <p><dfn
+      id="character-data"
+      title="character-data"
+      >Character data</dfn> contains
+    <a href="syntax-text">text</a>, in some cases in combination with
+    <a href="#syntax-charref">character references</a>),
+    along with certain additional restrictions. There are three
+    types of character data that can occur in documents:</p>
+    <ol>
+      <li><a href="#normal-character-data">normal character data</a></li>
+      <li><a href="#replaceable-character-data">replaceable character data</a></li>
+      <li><a href="#non-replaceable-character-data">non-replaceable character data</a></li>
+    </ol>
+    <dl id="character-data-types-list">
+      <dt><dfn
+          id="normal-character-data"
+          title="normal-character-data">Normal character data</dfn></dt>
+      <dd>
+        <p>Certain elements and strings in the values of
+        particular attributes contain normal character data.
+        Normal character data can contain the following:</p>
+        <ul>
+          <li><a href="#syntax-text">text</a></li>
+          <li><a href="#syntax-charref">character references</a></li>
+        </ul>
+        <p>Normal character data has the following restrictions:</p>
+        <ul>
+          <li>must not contain any
+          "<code title="U+003C LESS-THAN SIGN">&lt;</code>"
+          characters</li>
+          <li>must not contain any
+          <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+          <li>must not contain any
+          <a href="#syntax-escape">escaping text spans</a></li>
+        </ul>
+      </dd>
+      <dt><dfn
+          id="replaceable-character-data"
+          title="replaceable-character-data"
+          >Replaceable character data</dfn></dt>
+      <dd>
+        <p>In
+        <a href="#syntax-document-html">documents in the HTML syntax</a>,
+        the
+        <a href="#title" class="element">title</a>
+        and 
+        <a href="#textarea" class="element">textarea</a>
+        elements can contain replaceable character data.
+        Replaceable character data can contain the following:</p>
+        <ul>
+          <li><a href="#syntax-text">text</a>,
+          optionally including
+          "<code title="U+003C LESS-THAN SIGN">&lt;</code>"
+          characters and
+          <a href="#syntax-escape">escaping text spans</a></li>
+          <li><a href="#syntax-charref">character references</a></li>
+        </ul>
+        <p>Replaceable character data has the following restrictions:</p>
+        <ul>
+          <li>must not contain any
+          <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+          <li>must not contain any occurrences of the string
+          "<code>&lt;/</code>" (U+003C LESS-THAN SIGN,
+          U+002F SOLIDUS) followed by characters that
+          case-insensitively match the tag name of the
+          element containing the replaceable character data
+          (for example, "<code>&lt;/title</code>" or
+          "<code>&lt;/textarea</code>"),
+          followed by one of U+0009 CHARACTER TABULATION,
+          U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+          U+0020 SPACE, U+003E GREATER-THAN SIGN (&gt;), or
+          U+002F SOLIDUS (/), unless that string is part of
+          an
+          <a href="#syntax-escape">escaping text span</a>.</li>
+        </ul>
+        <p class="note">Replaceable character data,
+        as defined in this specification, is a feature of
+        <a href="#html-syntax">the HTML syntax</a>
+        that is not available in
+        <a href="#xml-syntax">the XML syntax</a>.
+        <a href="#syntax-document-xml">Documents in the XML
+          syntax</a> must not contain replaceable character data
+        as defined in this specification; instead they must
+        conform to all syntax constraints defined in the XML
+        specification <a href="#refsXML">[XML]</a>.</p>
+      </dd>
+      <dt><dfn
+          id="non-replaceable-character-data"
+          title="non-replaceable-character-data"
+          >Non-replaceable character data</dfn></dt>
+      <dd>
+        <p>In
+        <a href="#syntax-document-html">documents in the HTML syntax</a>,
+        the
+        <a href="#script" class="element">script</a>
+        and 
+        <a href="#style" class="element">style</a>
+        elements can contain non-replaceable character data.
+        Non-replaceable character data can contain the
+        following:</p>
+        <ul>
+          <li><a href="#syntax-text">text</a>,
+          optionally including
+          "<code title="U+003C LESS-THAN SIGN">&lt;</code>"
+          characters and
+          <a href="#syntax-escape">escaping text spans</a></li>
+          <li><a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a></li>
+        </ul>
+        <p>Non-replaceable character data has the following restrictions:</p>
+        <ul>
+          <li>must not contain <a href="#syntax-charref">character references</a></li>
+          <li>must not contain any occurrences of the string
+          "<code>&lt;/</code>" (U+003C LESS-THAN SIGN,
+          U+002F SOLIDUS) followed by characters that
+          case-insensitively match the tag name of the
+          element containing the replaceable character data
+          (for example, "<code>&lt;/script</code>" or
+          "<code>&lt;/style</code>"),
+          followed by one of U+0009 CHARACTER TABULATION,
+          U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+          U+0020 SPACE, U+003E GREATER-THAN SIGN (&gt;), or
+          U+002F SOLIDUS (/), unless that string is part of
+          an
+          <a href="#syntax-escape">escaping text span</a>.</li>
+        </ul>
+        <p class="note">Non-replaceable character data,
+        as defined in this specification, is a feature of
+        <a href="#html-syntax">the HTML syntax</a>
+        that is not available in
+        <a href="#xml-syntax">the XML syntax</a>.
+        <a href="#syntax-document-xml">Documents in the XML
+          syntax</a> must not contain non-replaceable character
+        data as defined in this specification; instead they must
+        conform to all syntax constraints defined in the XML
+        specification <a href="#refsXML">[XML]</a>.</p>
+      </dd>
+    </dl>
   </section>
   <section id="character-references">
     <h2>Character references</h2>
@@ -571,7 +629,7 @@
           href="#refsEntities">[Entities]</a>, using the same
         case, terminated by a U+003B SEMICOLON (<code
           title="">;</code>) character.</dd>
-      <dt><dfn id="dec-charref">Decimal numeric character reference.</dfn></dt>
+      <dt><dfn id="dec-charref">Decimal numeric character reference</dfn></dt>
       <dd>The ampersand must be followed by a U+0023 NUMBER SIGN
       (<code>#</code>) character, followed by one or more digits in
       the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing
@@ -608,21 +666,41 @@
     <h2>Comments</h2>
     <p>
     <dfn id="syntax-comments" title="syntax-comments">Comments</dfn>
-    must start with the four character sequence U+003C LESS-THAN
-    SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
-    HYPHEN-MINUS (<code title="">&lt;!--</code>). Following that
-    sequence, the comment may have
-    <a href="#syntax-text" title="syntax-text">text</a>, with the
-    additional restriction that the text must not start with a
-    single U+003E GREATER-THAN SIGN ('&gt;') character, nor start
-    with a U+002D HYPHEN-MINUS (<code title="">-</code>) character
-    followed by a U+003E GREATER-THAN SIGN ('&gt;') character, nor
-    contain two consecutive U+002D HYPHEN-MINUS
-    (<code title="">-</code>) characters, nor end with a U+002D
-    HYPHEN-MINUS (<code title="">-</code>) character. Finally, the
-    comment must be ended by the three character sequence U+002D
-    HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN
-    (<code title="">--&gt;</code>).</p>
+    consist of the following three parts, in exactly the following
+    order:</p>
+    <ol>
+      <li>the string
+      "<code
+        title="U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS"
+        >&lt;!--</code>"
+      </li>
+      <li><a href="#syntax-text">text</a></li>
+      <li>the string
+      "<code
+        title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN"
+        >--&gt;</code>"
+      </li>
+    </ol>
+    <p>The <a href="#syntax-text">text</a>
+    part of comments has the following restrictions:</p>
+    <ul>
+      <li>must not start with a
+      "<code
+        title="U+003E GREATER-THAN SIGN"
+        >&gt;</code>" character</li>
+      <li>must not start with the string
+      "<code
+        title="U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN"
+        >-&gt;</code>"</li>
+      <li>must not contain the string
+      "<code
+        title="U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS"
+        >--</code>"</li>
+      <li>must not end with a
+      "<code
+        title="U+002D HYPHEN-MINUS"
+        >-</code>" character</li>
+    </ul>
     <div class="example">
     <p>The following is an example of a comment.</p>
     <pre>&lt;!-- main content starts here --></pre>

Index: datatypes.html
===================================================================
RCS file: /sources/public/html5/markup/src/datatypes.html,v
retrieving revision 1.26
retrieving revision 1.27
diff -u -d -r1.26 -r1.27
--- datatypes.html	29 Jun 2009 09:18:49 -0000	1.26
+++ datatypes.html	8 Jul 2009 10:31:03 -0000	1.27
@@ -5,7 +5,8 @@
     <p>For any pattern in this document that references the <a
       href="#data-string">string</a> datatype, a
     <dfn id="data-string" title="string">string</dfn>
-    is defined as <a href="#character-data">character data</a>
+    is defined as
+    <a href="#normal-character-data">normal character data</a>
     that does not contain any
     <a href="#syntax-ambiguous-ampersand">ambiguous ampersands</a>.</p>
     <p>The <a href="#syntax-attributes">Attributes</a> section of
Received on Wednesday, 8 July 2009 10:31:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 8 July 2009 10:31:18 GMT