- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Sat, 27 Mar 2010 04:44:41 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv21356
Modified Files:
Overview.html
Log Message:
Provide rationale for authoring conformance criteria. (whatwg r4876)
Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3902
retrieving revision 1.3903
diff -u -d -r1.3902 -r1.3903
--- Overview.html 27 Mar 2010 03:53:24 -0000 1.3902
+++ Overview.html 27 Mar 2010 04:44:37 -0000 1.3903
@@ -422,7 +422,12 @@
<li><a href="#how-to-read-this-specification"><span class="secno">1.7.1 </span>How to read this specification</a></li>
<li><a href="#typographic-conventions"><span class="secno">1.7.2 </span>Typographic conventions</a></ol></li>
<li><a href="#a-quick-introduction-to-html"><span class="secno">1.8 </span>A quick introduction to HTML</a></li>
- <li><a href="#recommended-reading"><span class="secno">1.9 </span>Recommended reading</a></ol></li>
+ <li><a href="#conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</a>
+ <ol>
+ <li><a href="#presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</a></li>
+ <li><a href="#syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</a></li>
+ <li><a href="#restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</a></ol></li>
+ <li><a href="#recommended-reading"><span class="secno">1.10 </span>Recommended reading</a></ol></li>
<li><a href="#infrastructure"><span class="secno">2 </span>Common infrastructure</a>
<ol>
<li><a href="#terminology"><span class="secno">2.1 </span>Terminology</a>
@@ -1560,7 +1565,489 @@
specification might also be of use, but the novice author is
cautioned that this specification, by necessity, defines the
language with a level of detail that might be difficult to
- understand at first.<h3 id="recommended-reading"><span class="secno">1.9 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this
+ understand at first.<h3 id="conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</h3><p><i>This section is non-normative.</i><p>Unlike previous versions of the HTML specification, this
+ specification defines in some detail the required processing for
+ invalid documents as well as valid documents.</p><!-- This has led
+ to some questioning the purpose of conformance criteria: if there is
+ no ambiguity in how something will be processed, why disallow it? --><p>However, even though the processing of invalid content is in most
+ cases well-defined, conformance requirements for documents are still
+ important: in practice, interoperability (the situation in which all
+ implementations process particular content in a reliable and
+ identical or equivalent way) is not the only goal of document
+ conformance requirements. This section details some of the more
+ common reasons for still distinguishing between a conforming
+ document and one with errors.<h4 id="presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</h4><p><i>This section is non-normative.</i><p>The majority of presentational features from previous versions of
+ HTML are no longer allowed. Presentational markup in general has
+ been found to have a number of problems:<dl><dt>The use of presentational elements leads to poorer accessibility</dt>
+
+ <dd>
+
+ <p>While it is possible to use presentational markup in a way that
+ provides users of assistive technologies (ATs) with an acceptable
+ experience (e.g. using ARIA), doing so is significantly more
+ difficult than doing so when using semantically-appropriate
+ markup. Furthermore, even using such techniques doesn't help make
+ pages accessible for non-AT non-graphical users, such as users of
+ text-mode browsers.</p>
+
+ <p>Using media-independent markup, on the other hand, provides an
+ easy way for documents to be authored in such a way that they work
+ for more users (e.g. text browsers).</p>
+
+ </dd>
+
+
+ <dt>Higher cost of maintenance</dt>
+
+ <dd>
+
+ <p>It is significantly easier to maintain a site written in such a
+ way that the markup is style-independent. For example, changing
+ the colour of a site that uses
+ <code><font color=""></code> throughout requires changes
+ across the entire site, whereas a similar change to a site based
+ on CSS can be done by changing a single file.</p>
+
+ </dd>
+
+
+ <dt>Higher document sizes</dt>
+
+ <dd>
+
+ <p>Presentational markup tends to be much more redundant, and thus
+ results in larger document sizes.</p>
+
+ </dd>
+
+ </dl><p>For those reasons, presentational markup has been removed from
+ HTML in this version. This change should not come as a surprise;
+ HTML4 deprecated presentational markup many years ago and provided a
+ mode (HTML4 Transitional) to help authors move away from
+ presentational markup; later, XHTML 1.1 went further and obsoleted
+ those features altogether.<p>The only remaining presentational markup features in HTML are the
+ <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute and the
+ <code><a href="#the-style-element">style</a></code> element. Use of the <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute is somewhat discouraged in
+ production environments, but it can be useful for rapid prototyping
+ (where its rules can be directly moved into a separate style sheet
+ later) and for providing specific styles in unusual cases where a
+ separate style sheet would be inconvenient. Similarly, the
+ <code><a href="#the-style-element">style</a></code> element can be useful in syndication or for
+ page-specific styles, but in general an external style sheet is
+ likely to be more convenient when the styles apply to multiple
+ pages.<p>It is also worth noting that four elements that were previously
+ presentational have been redefined in this specification to be
+ media-independent: <code><a href="#the-b-element">b</a></code>, <code><a href="#the-i-element">i</a></code>, <code><a href="#the-hr-element">hr</a></code>,
+ and <code><a href="#the-small-element">small</a></code>.<h4 id="syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</h4><p><i>This section is non-normative.</i><p>The syntax of HTML is constrained to avoid a wide variety of
+ problems.<dl><dt>Unintuitive error-handling behavior</dt>
+
+ <dd>
+
+ <p>Certain invalid syntax constructs, when parsed, result in DOM
+ trees that are highly unintuitive.</p>
+
+ <div class="example">
+
+ <p>For example, the following markup fragment results in a DOM
+ with an <code><a href="#the-hr-element">hr</a></code> element that is an <em>earlier</em>
+ sibling of the corresponding <code><a href="#the-table-element">table</a></code> element:</p>
+
+ <pre class="bad"><table><hr>...</pre>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors with optional error recovery</dt>
+
+ <dd>
+
+ <p>To allow user agents to be used in constrolled environments
+ without having to implement the more bizarre and convoluted error
+ handling rules, user agents are permitted to fail whenever
+ encountering a <a href="#parse-error">parse error</a>.</p>
+
+ </dd>
+
+
+ <dt>Errors where the error-handling behavior is not compatible with streaming user agents</dt>
+
+ <dd>
+
+ <p>Some error-handling behavior, such as the behavior for the
+ <code title=""><table><hr>...</code> example mentioned
+ above, are incompatible with streaming user agents. To avoid
+ interoperability problems with such user agents, any syntax
+ resulting in such behavior is considered invalid.</p>
+
+ </dd>
+
+
+ <dt>Errors that can result in infoset coercion</dt>
+
+ <dd>
+
+ <p>When a user agent based on XML is connected to an HTML parser,
+ it is possible that certain invariants that XML enforces, such as
+ comments never containing two consecutive hyphens, will be
+ violated by an HTML file. Handling this can require that the
+ parser coerce the HTML DOM into an XML-compatible infoset. Most
+ syntax constructs that require such handling are considered
+ invalid.</p>
+
+ </dd>
+
+
+ <dt>Errors that result in disproportionally poor performance</dt>
+
+ <dd>
+
+ <p>Certain syntax constructs can result in disproportionally poor
+ performance. To discourage the use of such constructs, they are
+ typically made non-conforming.</p>
+
+ <div class="example">
+
+ <p>For example, the following markup results in poor performance
+ when hitting the highlighted end tag, since all the open elements
+ are examined first to see if they match the close tag:</p>
+
+ <pre class="bad"><p><em><span><span><span>...<span><span><span><strong></em></strong></pre>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors that help authors avoid fragile syntax constructs</dt>
+
+ <dd>
+
+ <p>There are syntax constructs that, for historical reasons, are
+ relatively fragile. To help reduce the number of users who
+ accidentally run into such problems, they are made
+ non-conforming.</p>
+
+ <div class="example">
+
+ <p>For example, the parsing of certain named character references
+ in attributes happens even with the closing semicolon being
+ omitted. It is safe to include an ampersand followed by letters
+ that do not form a named character reference, but if the letters
+ are changed to a string that <em>does</em> form a named character
+ reference, they will be interpreted as that character instead.</p>
+
+ <p>In this fragment, the attribute's value is "<code title="">?hello=1&world=2</code>":</p>
+
+ <pre class="bad"><a href="?hello=1&world=2">Demo</a></pre>
+
+ <p>In the following fragment, however, the attribute's value is
+ actually "<code title="">?original=1©=2</code>",
+ <em>not</em> the intended "<code title="">?original=1&copy=2</code>":</p>
+
+ <pre class="bad"><a href="?original=1&copy=2">Compare</a></pre>
+
+ <p>To avoid this problem, all named character references are
+ required to end with a semicolon, and any ampersands followed by
+ letters are required to be escaped.</p>
+
+ <p>Thus, the correct way to express the above cases is as
+ follows:</p>
+
+ <pre><a href="?hello=1&amp;world=2">Demo</a></pre>
+ <pre><a href="?original=1&amp;copy=2">Compare</a></pre>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors that flag known interoperability problems in legacy user agents</dt>
+
+ <dd>
+
+ <p>Certain syntax constructs are known to cause especially subtle
+ or serious problems in legacy user agents, and are therefore
+ marked as non-conforming to help authors avoid them.</p>
+
+ <div class="example">
+
+ <p>For example, this is why the U+0060 GRAVE ACCENT character (`)
+ is not allowed in unquoted attributes. In certain legacy user
+ agents, <!-- namely IE --> it is sometimes treated as a quote
+ character.</p>
+
+ </div>
+
+ <div class="example">
+
+ <p>Another example of this is the DOCTYPE, which is required to
+ trigger <a href="#no-quirks-mode">no-quirks mode</a>, because the behavior of
+ legacy user agents in <a href="#quirks-mode">quirks mode</a> is often largely
+ undocumented.</p>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors that protect authors from security attacks</dt>
+
+ <dd>
+
+ <p>Certain restrictions exist purely to avoid known security
+ problems.</p>
+
+ <div class="example">
+
+ <p>For example, the restriction on using UTF-7 exists purely to
+ avoid authors falling prey to a known cross-site-scripting attack
+ using UTF-7.</p>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Cases where the author's intent is unclear</dt>
+
+ <dd>
+
+ <p>Some errors merely flag cases where the author's intent is most
+ unclear. Correcting these errors early makes later maintenance easier.</p>
+
+ <div class="example">
+
+ <p>For example, it is unclear whether the author intended the
+ following to be an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h1</a></code> heading or an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h2</a></code>
+ heading:</p>
+
+ <pre class="bad"><h1>Contact details</h2></pre>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Cases that are likely to be typos</dt>
+
+ <dd>
+
+ <p>When a user makes a simple typo, it is helpful if the error can
+ be caught early, as this can save the author a lot of debugging
+ time. This specification therefore usually considers it an error
+ to use element names, attribute names, and so forth, that do not
+ match the names defined in this specification.</p>
+
+ <div class="example">
+
+ <p>For example, if the author typed <code><capton></code>
+ instead of <code><caption></code>, this would be flagged as an
+ error and the author could correct the typo immediately.</p>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors that allow for new syntax in future</dt>
+
+ <dd>
+
+ <p>In order to allow us to extend the language syntax in the
+ future, certain otherwise harmless features are disallowed.</p>
+
+ <div class="example">
+
+ <p>For example, "attributes" in end tags are ignored currently,
+ but they are invalid, in case a future change to the language
+ makes use of that syntax feature without conflicting with
+ already-deployed (and valid!) content.</p>
+
+ </div>
+
+ </dd>
+
+
+ </dl><p>Some authors find it helpful to be in the practice of always
+ quoting all attributes and always including all optional tags,
+ preferring the consistency derived from such custom over the minor
+ benefits of terseness afforded by making use of the flexibility of
+ the HTML syntax. To aid such authors, conformance checkers can
+ provide modes of operation wherein such conventions are
+ enforced.<h4 id="restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</h4><p><i>This section is non-normative.</i><p>Beyond the syntax of the language, this specification also places
+ restrictions on how elements and attributes can be specified. These
+ restrictions are present for similar reasons:<dl><dt>Errors that flag content with dubious semantics</dt>
+
+ <dd>
+
+ <p>To avoid misuse of elements with defined meanings, content
+ models are defined that restrict how elements can be nested when
+ such nestings would be of dubious value.</p>
+
+ <p class="example">For example, this specification disallows
+ nesting a <code><a href="#the-section-element">section</a></code> element inside a <code><a href="#the-kbd-element">kbd</a></code>
+ element, since it is highly unlikely for an author to indicate
+ that an entire section should be keyed in.</p>
+
+ </dd>
+
+
+ <dt>Errors that indicate a conflict in expressed semantics</dt>
+
+ <dd>
+
+ <p>Similarly, to draw the author's attention to mistakes in the
+ use of elements, clear contradictions in the semantics expressed
+ are also considered conformance errors.</p>
+
+ <div class="example">
+
+ <p>In the fragments below, for example, the semantics are
+ nonsensical: a row cannot simultaneously be a cell, nor can a
+ radio button be a progress bar.</p>
+
+ <pre class="bad"><tr role="cell"></pre>
+ <pre class="bad"><input type=radio role=progressbar></pre>
+
+ </div>
+
+ </dd>
+
+
+ <dt>Errors that encourage a correct understanding of the spec</dt>
+
+ <dd>
+
+ <p>Sometimes, something is disallowed because allowing it would
+ likely cause author confusion.</p>
+
+ <p class="example">For example, setting the <code title="attr-fe-disabled"><a href="#attr-fe-disabled">disabled</a></code> attribute to the value
+ "<code title="">false</code>" is disallowed, because despite the
+ appearance of meaning that the element is enabled, it in fact
+ means that the element is <em>disabled</em> (what matters for
+ implementations it the presence of the attribute, not its
+ value).</p>
+
+ </dd>
+
+
+ <dt>Errors that are intended merely to simplify the language</dt>
+
+ <dd>
+
+ <p>Some conformance errors simplify the language that authors need
+ to learn.</p>
+
+ <p class="example">For example, the <code><a href="#the-area-element">area</a></code> element's
+ <code title="attr-area-shape"><a href="#attr-area-shape">shape</a></code> attribute, despite
+ accepting both <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> and <code title="attr-area-shape-keyword-circle"><a href="#attr-area-shape-keyword-circle">circle</a></code> values in
+ practice as synonyms, disallows the use of the <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> value, so as to
+ simplify tutorials and other learning aids. There would be no
+ benefit to allowing both, but it would cause extra confusion when
+ teaching the language.</p>
+
+ </dd>
+
+
+ <dt>Errors that would likely result in scripts failing in hard-to-debug ways</dt>
+
+ <dd>
+
+ <p>Some errors are intended to help prevent script problems that
+ would be hard to debug.</p>
+
+ <p class="example">This is why, for instance, it is non-conforming
+ to have two <code title="attr-id"><a href="#the-id-attribute">id</a></code> attributes with the
+ same value. Duplicate IDs lead to the wrong element being
+ selected, with sometimes disastrous effects whose cause is hard to
+ determine.</p>
+
+ </dd>
+
+
+ <dt>Errors that are intended to save the author time</dt>
+
+ <dd>
+
+ <p>Some constructs are disallowed because historically they have
+ been the cause of a lot of wasted authoring time.</p>
+
+ <p class="example">For example, a <code><a href="#script">script</a></code> element's
+ <code title="attr-script-src"><a href="#attr-script-src">src</a></code> attribute causes the
+ element's contents to be ignored. However, this isn't obvious,
+ especially if the element's contents appear to be executable
+ script — which can lead to authors spending a lot of time
+ trying to debug the inlien script without realising that it is not
+ executing. To reduce this problem, this specifications makes it
+ non-conforming to have executable script in a <code><a href="#script">script</a></code>
+ element when the <code title="attr-script-src"><a href="#attr-script-src">src</a></code>
+ attribute is present. This means that authors who are validating
+ their documents are less likely to waste time with this kind of
+ mistake.</p>
+
+ </dd>
+
+
+ <dt>Errors that are intended to help authors of polyglot documents</dt>
+
+ <dd>
+
+ <p>Some authors like to write files that can be interpreted as
+ both XML and HTML with similar results. These are known as
+ polyglot documents. Though this practice is discouraged in general
+ due to the myriad of subtle complications involved (especially
+ when involving scripting, styling, or any kind of automated
+ serialization), this specification has a few restrictions intended
+ to at least somewhat mitigate the difficulties.</p>
+
+ <p class="example">For example, there are somewhat complicated
+ rules surrounding the <code title="attr-lang"><a href="#attr-lang">lang</a></code> and
+ <code title="attr-xml-lang"><a href="#attr-xml-lang">xml:lang</a></code> attributes intended
+ to keep the two synchronized.</p>
+
+ <p class="example">Another example would be the restrictions on
+ the values of <code title="">xmlns</code> attributes in the HTML
+ serialization, which are intended to ensure that elements in
+ conforming polyglot documents end up in the same namespaces
+ whether processed as HTML or XML.</p>
+
+ </dd>
+
+
+ <dt>Errors that reserve space for future expansion</dt>
+
+ <dd>
+
+ <p>As with the restrictions on the syntax intended to allow for
+ new syntax in future revisions of the language, some restrictions
+ on the content models of elements and values of attributes are
+ intended to allow for future expansion of the HTML vocabulary.</p>
+
+ <p class="example">For example, limiting the values of the <code title="attr-hyperlink-target"><a href="#attr-hyperlink-target">target</a></code> attribute that start
+ with an U+005F LOW LINE character (_) to only specific predefined
+ values allows new predefined values to be introduced at a future
+ time without conflicting with author-defined values.</p>
+
+ </dd>
+
+
+ <dt>Errors that indicate a mis-use of other specifications</dt>
+
+ <dd>
+
+ <p>Certain restrictions are intended to support the restrictions
+ made by other specifications.</p>
+
+ <p class="example">For example, requiring that attributes that
+ take media queries use only <em>valid</em> media queries
+ reinforces the importance of following the conformance rules of
+ that specification.</p>
+
+ </dd>
+
+ </dl><h3 id="recommended-reading"><span class="secno">1.10 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this
specification.<dl><dt><cite>Character Model for the World Wide Web 1.0: Fundamentals</cite> <a href="#refsCHARMOD">[CHARMOD]</a></dt>
<dd><blockquote><p>This Architectural Specification provides
Received on Saturday, 27 March 2010 04:44:43 UTC