hixie: Provide rationale for authoring conformance criteria. (whatwg r4876) from poot on 2010-03-27 (public-html-diffs@w3.org from March 2010)

From: poot <cvsmail@w3.org>
Date: Sat, 27 Mar 2010 13:44:53 +0900 (JST)
To: public-html-diffs@w3.org
Message-Id: <20100327044453.9404D2BBF1@toro.w3.mag.keio.ac.jp>
hixie: Provide rationale for authoring conformance criteria. (whatwg
r4876)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3902&r2=1.3903&f=h
http://html5.org/tools/web-apps-tracker?from=4875&to=4876

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3902
retrieving revision 1.3903
diff -u -d -r1.3902 -r1.3903
--- Overview.html 27 Mar 2010 03:53:24 -0000 1.3902
+++ Overview.html 27 Mar 2010 04:44:37 -0000 1.3903
@@ -422,7 +422,12 @@
      <li><a href="#how-to-read-this-specification"><span class="secno">1.7.1 </span>How to read this specification</a></li>
      <li><a href="#typographic-conventions"><span class="secno">1.7.2 </span>Typographic conventions</a></ol></li>
    <li><a href="#a-quick-introduction-to-html"><span class="secno">1.8 </span>A quick introduction to HTML</a></li>
-   <li><a href="#recommended-reading"><span class="secno">1.9 </span>Recommended reading</a></ol></li>
+   <li><a href="#conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</a>
+    <ol>
+     <li><a href="#presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</a></li>
+     <li><a href="#syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</a></li>
+     <li><a href="#restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</a></ol></li>
+   <li><a href="#recommended-reading"><span class="secno">1.10 </span>Recommended reading</a></ol></li>
  <li><a href="#infrastructure"><span class="secno">2 </span>Common infrastructure</a>
   <ol>
    <li><a href="#terminology"><span class="secno">2.1 </span>Terminology</a>
@@ -1560,7 +1565,489 @@
   specification might also be of use, but the novice author is
   cautioned that this specification, by necessity, defines the
   language with a level of detail that might be difficult to
-  understand at first.<h3 id="recommended-reading"><span class="secno">1.9 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this
+  understand at first.<h3 id="conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</h3><p><i>This section is non-normative.</i><p>Unlike previous versions of the HTML specification, this
+  specification defines in some detail the required processing for
+  invalid documents as well as valid documents.</p><!-- This has led
+  to some questioning the purpose of conformance criteria: if there is
+  no ambiguity in how something will be processed, why disallow it? --><p>However, even though the processing of invalid content is in most
+  cases well-defined, conformance requirements for documents are still
+  important: in practice, interoperability (the situation in which all
+  implementations process particular content in a reliable and
+  identical or equivalent way) is not the only goal of document
+  conformance requirements. This section details some of the more
+  common reasons for still distinguishing between a conforming
+  document and one with errors.<h4 id="presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</h4><p><i>This section is non-normative.</i><p>The majority of presentational features from previous versions of
+  HTML are no longer allowed. Presentational markup in general has
+  been found to have a number of problems:<dl><dt>The use of presentational elements leads to poorer accessibility</dt>
+
+   <dd>
+
+    <p>While it is possible to use presentational markup in a way that
+    provides users of assistive technologies (ATs) with an acceptable
+    experience (e.g. using ARIA), doing so is significantly more
+    difficult than doing so when using semantically-appropriate
+    markup. Furthermore, even using such techniques doesn't help make
+    pages accessible for non-AT non-graphical users, such as users of
+    text-mode browsers.</p>
+
+    <p>Using media-independent markup, on the other hand, provides an
+    easy way for documents to be authored in such a way that they work
+    for more users (e.g. text browsers).</p>
+
+   </dd>
+
+
+   <dt>Higher cost of maintenance</dt>
+
+   <dd>
+
+    <p>It is significantly easier to maintain a site written in such a
+    way that the markup is style-independent. For example, changing
+    the colour of a site that uses
+    <code>&lt;font&nbsp;color=""&gt;</code> throughout requires changes
+    across the entire site, whereas a similar change to a site based
+    on CSS can be done by changing a single file.</p>
+
+   </dd>
+
+
+   <dt>Higher document sizes</dt>
+
+   <dd>
+
+    <p>Presentational markup tends to be much more redundant, and thus
+    results in larger document sizes.</p>
+
+   </dd>
+
+  </dl><p>For those reasons, presentational markup has been removed from
+  HTML in this version. This change should not come as a surprise;
+  HTML4 deprecated presentational markup many years ago and provided a
+  mode (HTML4 Transitional) to help authors move away from
+  presentational markup; later, XHTML 1.1 went further and obsoleted
+  those features altogether.<p>The only remaining presentational markup features in HTML are the
+  <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute and the
+  <code><a href="#the-style-element">style</a></code> element. Use of the <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute is somewhat discouraged in
+  production environments, but it can be useful for rapid prototyping
+  (where its rules can be directly moved into a separate style sheet
+  later) and for providing specific styles in unusual cases where a
+  separate style sheet would be inconvenient. Similarly, the
+  <code><a href="#the-style-element">style</a></code> element can be useful in syndication or for
+  page-specific styles, but in general an external style sheet is
+  likely to be more convenient when the styles apply to multiple
+  pages.<p>It is also worth noting that four elements that were previously
+  presentational have been redefined in this specification to be
+  media-independent: <code><a href="#the-b-element">b</a></code>, <code><a href="#the-i-element">i</a></code>, <code><a href="#the-hr-element">hr</a></code>,
+  and <code><a href="#the-small-element">small</a></code>.<h4 id="syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</h4><p><i>This section is non-normative.</i><p>The syntax of HTML is constrained to avoid a wide variety of
+  problems.<dl><dt>Unintuitive error-handling behavior</dt>
+
+   <dd>
+
+    <p>Certain invalid syntax constructs, when parsed, result in DOM
+    trees that are highly unintuitive.</p>
+
+    <div class="example">
+
+     <p>For example, the following markup fragment results in a DOM
+     with an <code><a href="#the-hr-element">hr</a></code> element that is an <em>earlier</em>
+     sibling of the corresponding <code><a href="#the-table-element">table</a></code> element:</p>
+
+     <pre class="bad">&lt;table&gt;&lt;hr&gt;...</pre>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors with optional error recovery</dt>
+
+   <dd>
+
+    <p>To allow user agents to be used in constrolled environments
+    without having to implement the more bizarre and convoluted error
+    handling rules, user agents are permitted to fail whenever
+    encountering a <a href="#parse-error">parse error</a>.</p>
+
+   </dd>
+
+
+   <dt>Errors where the error-handling behavior is not compatible with streaming user agents</dt>
+
+   <dd>
+
+    <p>Some error-handling behavior, such as the behavior for the
+    <code title="">&lt;table&gt;&lt;hr&gt;...</code> example mentioned
+    above, are incompatible with streaming user agents. To avoid
+    interoperability problems with such user agents, any syntax
+    resulting in such behavior is considered invalid.</p>
+
+   </dd>
+
+
+   <dt>Errors that can result in infoset coercion</dt>
+
+   <dd>
+
+    <p>When a user agent based on XML is connected to an HTML parser,
+    it is possible that certain invariants that XML enforces, such as
+    comments never containing two consecutive hyphens, will be
+    violated by an HTML file. Handling this can require that the
+    parser coerce the HTML DOM into an XML-compatible infoset. Most
+    syntax constructs that require such handling are considered
+    invalid.</p>
+
+   </dd>
+
+
+   <dt>Errors that result in disproportionally poor performance</dt>
+
+   <dd>
+
+    <p>Certain syntax constructs can result in disproportionally poor
+    performance. To discourage the use of such constructs, they are
+    typically made non-conforming.</p>
+
+    <div class="example">
+
+     <p>For example, the following markup results in poor performance
+     when hitting the highlighted end tag, since all the open elements
+     are examined first to see if they match the close tag:</p>
+
+     <pre class="bad">&lt;p&gt;&lt;em&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;...&lt;span&gt;&lt;span&gt;&lt;span&gt;<strong>&lt;/em&gt;</strong></pre>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors that help authors avoid fragile syntax constructs</dt>
+
+   <dd>
+
+    <p>There are syntax constructs that, for historical reasons, are
+    relatively fragile. To help reduce the number of users who
+    accidentally run into such problems, they are made
+    non-conforming.</p>
+
+    <div class="example">
+
+     <p>For example, the parsing of certain named character references
+     in attributes happens even with the closing semicolon being
+     omitted. It is safe to include an ampersand followed by letters
+     that do not form a named character reference, but if the letters
+     are changed to a string that <em>does</em> form a named character
+     reference, they will be interpreted as that character instead.</p>
+
+     <p>In this fragment, the attribute's value is "<code title="">?hello=1&amp;world=2</code>":</p>
+
+     <pre class="bad">&lt;a href="?hello=1&amp;world=2"&gt;Demo&lt;/a&gt;</pre>
+
+     <p>In the following fragment, however, the attribute's value is
+     actually "<code title="">?original=1&copy;=2</code>",
+     <em>not</em> the intended "<code title="">?original=1&amp;copy=2</code>":</p>
+
+     <pre class="bad">&lt;a href="?original=1&amp;copy=2"&gt;Compare&lt;/a&gt;</pre>
+
+     <p>To avoid this problem, all named character references are
+     required to end with a semicolon, and any ampersands followed by
+     letters are required to be escaped.</p>
+
+     <p>Thus, the correct way to express the above cases is as
+     follows:</p>
+
+     <pre>&lt;a href="?hello=1&amp;amp;world=2"&gt;Demo&lt;/a&gt;</pre>
+     <pre>&lt;a href="?original=1&amp;amp;copy=2"&gt;Compare&lt;/a&gt;</pre>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors that flag known interoperability problems in legacy user agents</dt>
+
+   <dd>
+
+    <p>Certain syntax constructs are known to cause especially subtle
+    or serious problems in legacy user agents, and are therefore
+    marked as non-conforming to help authors avoid them.</p>
+
+    <div class="example">
+
+     <p>For example, this is why the U+0060 GRAVE ACCENT character (`)
+     is not allowed in unquoted attributes. In certain legacy user
+     agents, <!-- namely IE --> it is sometimes treated as a quote
+     character.</p>
+
+    </div>
+
+    <div class="example">
+
+     <p>Another example of this is the DOCTYPE, which is required to
+     trigger <a href="#no-quirks-mode">no-quirks mode</a>, because the behavior of
+     legacy user agents in <a href="#quirks-mode">quirks mode</a> is often largely
+     undocumented.</p>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors that protect authors from security attacks</dt>
+
+   <dd>
+
+    <p>Certain restrictions exist purely to avoid known security
+    problems.</p>
+
+    <div class="example">
+
+     <p>For example, the restriction on using UTF-7 exists purely to
+     avoid authors falling prey to a known cross-site-scripting attack
+     using UTF-7.</p>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Cases where the author's intent is unclear</dt>
+
+   <dd>
+
+    <p>Some errors merely flag cases where the author's intent is most
+    unclear. Correcting these errors early makes later maintenance easier.</p>
+
+    <div class="example">
+
+     <p>For example, it is unclear whether the author intended the
+     following to be an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h1</a></code> heading or an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h2</a></code>
+     heading:</p>
+
+     <pre class="bad">&lt;h1&gt;Contact details&lt;/h2&gt;</pre>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Cases that are likely to be typos</dt>
+
+   <dd>
+
+    <p>When a user makes a simple typo, it is helpful if the error can
+    be caught early, as this can save the author a lot of debugging
+    time. This specification therefore usually considers it an error
+    to use element names, attribute names, and so forth, that do not
+    match the names defined in this specification.</p>
+
+    <div class="example">
+
+     <p>For example, if the author typed <code>&lt;capton&gt;</code>
+     instead of <code>&lt;caption&gt;</code>, this would be flagged as an
+     error and the author could correct the typo immediately.</p>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors that allow for new syntax in future</dt>
+
+   <dd>
+
+    <p>In order to allow us to extend the language syntax in the
+    future, certain otherwise harmless features are disallowed.</p>
+
+    <div class="example">
+
+     <p>For example, "attributes" in end tags are ignored currently,
+     but they are invalid, in case a future change to the language
+     makes use of that syntax feature without conflicting with
+     already-deployed (and valid!) content.</p>
+
+    </div>
+
+   </dd>
+
+
+  </dl><p>Some authors find it helpful to be in the practice of always
+  quoting all attributes and always including all optional tags,
+  preferring the consistency derived from such custom over the minor
+  benefits of terseness afforded by making use of the flexibility of
+  the HTML syntax. To aid such authors, conformance checkers can
+  provide modes of operation wherein such conventions are
+  enforced.<h4 id="restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</h4><p><i>This section is non-normative.</i><p>Beyond the syntax of the language, this specification also places
+  restrictions on how elements and attributes can be specified. These
+  restrictions are present for similar reasons:<dl><dt>Errors that flag content with dubious semantics</dt>
+
+   <dd>
+
+    <p>To avoid misuse of elements with defined meanings, content
+    models are defined that restrict how elements can be nested when
+    such nestings would be of dubious value.</p>
+
+    <p class="example">For example, this specification disallows
+    nesting a <code><a href="#the-section-element">section</a></code> element inside a <code><a href="#the-kbd-element">kbd</a></code>
+    element, since it is highly unlikely for an author to indicate
+    that an entire section should be keyed in.</p>
+
+   </dd>
+
+
+   <dt>Errors that indicate a conflict in expressed semantics</dt>
+
+   <dd>
+
+    <p>Similarly, to draw the author's attention to mistakes in the
+    use of elements, clear contradictions in the semantics expressed
+    are also considered conformance errors.</p>
+
+    <div class="example">
+
+     <p>In the fragments below, for example, the semantics are
+     nonsensical: a row cannot simultaneously be a cell, nor can a
+     radio button be a progress bar.</p>
+
+     <pre class="bad">&lt;tr role="cell"&gt;</pre>
+     <pre class="bad">&lt;input type=radio role=progressbar&gt;</pre>
+
+    </div>
+
+   </dd>
+
+
+   <dt>Errors that encourage a correct understanding of the spec</dt>
+
+   <dd>
+
+    <p>Sometimes, something is disallowed because allowing it would
+    likely cause author confusion.</p>
+
+    <p class="example">For example, setting the <code title="attr-fe-disabled"><a href="#attr-fe-disabled">disabled</a></code> attribute to the value
+    "<code title="">false</code>" is disallowed, because despite the
+    appearance of meaning that the element is enabled, it in fact
+    means that the element is <em>disabled</em> (what matters for
+    implementations it the presence of the attribute, not its
+    value).</p>
+
+   </dd>
+
+
+   <dt>Errors that are intended merely to simplify the language</dt>
+
+   <dd>
+
+    <p>Some conformance errors simplify the language that authors need
+    to learn.</p>
+
+    <p class="example">For example, the <code><a href="#the-area-element">area</a></code> element's
+    <code title="attr-area-shape"><a href="#attr-area-shape">shape</a></code> attribute, despite
+    accepting both <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> and <code title="attr-area-shape-keyword-circle"><a href="#attr-area-shape-keyword-circle">circle</a></code> values in
+    practice as synonyms, disallows the use of the <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> value, so as to
+    simplify tutorials and other learning aids. There would be no
+    benefit to allowing both, but it would cause extra confusion when
+    teaching the language.</p>
+
+   </dd>
+
+
+   <dt>Errors that would likely result in scripts failing in hard-to-debug ways</dt>
+
+   <dd>
+
+    <p>Some errors are intended to help prevent script problems that
+    would be hard to debug.</p>
+
+    <p class="example">This is why, for instance, it is non-conforming
+    to have two <code title="attr-id"><a href="#the-id-attribute">id</a></code> attributes with the
+    same value. Duplicate IDs lead to the wrong element being
+    selected, with sometimes disastrous effects whose cause is hard to
+    determine.</p>
+
+   </dd>
+
+
+   <dt>Errors that are intended to save the author time</dt>
+
+   <dd>
+
+    <p>Some constructs are disallowed because historically they have
+    been the cause of a lot of wasted authoring time.</p>
+
+    <p class="example">For example, a <code><a href="#script">script</a></code> element's
+    <code title="attr-script-src"><a href="#attr-script-src">src</a></code> attribute causes the
+    element's contents to be ignored. However, this isn't obvious,
+    especially if the element's contents appear to be executable
+    script &mdash; which can lead to authors spending a lot of time
+    trying to debug the inlien script without realising that it is not
+    executing. To reduce this problem, this specifications makes it
+    non-conforming to have executable script in a <code><a href="#script">script</a></code>
+    element when the <code title="attr-script-src"><a href="#attr-script-src">src</a></code>
+    attribute is present. This means that authors who are validating
+    their documents are less likely to waste time with this kind of
+    mistake.</p>
+
+   </dd>
+
+
+   <dt>Errors that are intended to help authors of polyglot documents</dt>
+
+   <dd>
+
+    <p>Some authors like to write files that can be interpreted as
+    both XML and HTML with similar results. These are known as
+    polyglot documents. Though this practice is discouraged in general
+    due to the myriad of subtle complications involved (especially
+    when involving scripting, styling, or any kind of automated
+    serialization), this specification has a few restrictions intended
+    to at least somewhat mitigate the difficulties.</p>
+
+    <p class="example">For example, there are somewhat complicated
+    rules surrounding the <code title="attr-lang"><a href="#attr-lang">lang</a></code> and
+    <code title="attr-xml-lang"><a href="#attr-xml-lang">xml:lang</a></code> attributes intended
+    to keep the two synchronized.</p>
+
+    <p class="example">Another example would be the restrictions on
+    the values of <code title="">xmlns</code> attributes in the HTML
+    serialization, which are intended to ensure that elements in
+    conforming polyglot documents end up in the same namespaces
+    whether processed as HTML or XML.</p>
+
+   </dd>
+
+
+   <dt>Errors that reserve space for future expansion</dt>
+
+   <dd>
+
+    <p>As with the restrictions on the syntax intended to allow for
+    new syntax in future revisions of the language, some restrictions
+    on the content models of elements and values of attributes are
+    intended to allow for future expansion of the HTML vocabulary.</p>
+
+    <p class="example">For example, limiting the values of the <code title="attr-hyperlink-target"><a href="#attr-hyperlink-target">target</a></code> attribute that start
+    with an U+005F LOW LINE character (_) to only specific predefined
+    values allows new predefined values to be introduced at a future
+    time without conflicting with author-defined values.</p>
+ 
+   </dd>
+
+
+   <dt>Errors that indicate a mis-use of other specifications</dt>
+
+   <dd>
+
+    <p>Certain restrictions are intended to support the restrictions
+    made by other specifications.</p>
+
+    <p class="example">For example, requiring that attributes that
+    take media queries use only <em>valid</em> media queries
+    reinforces the importance of following the conformance rules of
+    that specification.</p>
+
+   </dd>
+
+  </dl><h3 id="recommended-reading"><span class="secno">1.10 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this
   specification.<dl><dt><cite>Character Model for the World Wide Web 1.0: Fundamentals</cite> <a href="#refsCHARMOD">[CHARMOD]</a></dt>
 
    <dd><blockquote><p>This Architectural Specification provides
Received on Saturday, 27 March 2010 04:45:24 UTC