- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Sat, 27 Mar 2010 04:44:41 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv21356 Modified Files: Overview.html Log Message: Provide rationale for authoring conformance criteria. (whatwg r4876) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.3902 retrieving revision 1.3903 diff -u -d -r1.3902 -r1.3903 --- Overview.html 27 Mar 2010 03:53:24 -0000 1.3902 +++ Overview.html 27 Mar 2010 04:44:37 -0000 1.3903 @@ -422,7 +422,12 @@ <li><a href="#how-to-read-this-specification"><span class="secno">1.7.1 </span>How to read this specification</a></li> <li><a href="#typographic-conventions"><span class="secno">1.7.2 </span>Typographic conventions</a></ol></li> <li><a href="#a-quick-introduction-to-html"><span class="secno">1.8 </span>A quick introduction to HTML</a></li> - <li><a href="#recommended-reading"><span class="secno">1.9 </span>Recommended reading</a></ol></li> + <li><a href="#conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</a> + <ol> + <li><a href="#presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</a></li> + <li><a href="#syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</a></li> + <li><a href="#restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</a></ol></li> + <li><a href="#recommended-reading"><span class="secno">1.10 </span>Recommended reading</a></ol></li> <li><a href="#infrastructure"><span class="secno">2 </span>Common infrastructure</a> <ol> <li><a href="#terminology"><span class="secno">2.1 </span>Terminology</a> @@ -1560,7 +1565,489 @@ specification might also be of use, but the novice author is cautioned that this specification, by necessity, defines the language with a level of detail that might be difficult to - understand at first.<h3 id="recommended-reading"><span class="secno">1.9 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this + understand at first.<h3 id="conformance-requirements-for-authors"><span class="secno">1.9 </span>Conformance requirements for authors</h3><p><i>This section is non-normative.</i><p>Unlike previous versions of the HTML specification, this + specification defines in some detail the required processing for + invalid documents as well as valid documents.</p><!-- This has led + to some questioning the purpose of conformance criteria: if there is + no ambiguity in how something will be processed, why disallow it? --><p>However, even though the processing of invalid content is in most + cases well-defined, conformance requirements for documents are still + important: in practice, interoperability (the situation in which all + implementations process particular content in a reliable and + identical or equivalent way) is not the only goal of document + conformance requirements. This section details some of the more + common reasons for still distinguishing between a conforming + document and one with errors.<h4 id="presentational-markup"><span class="secno">1.9.1 </span>Presentational markup</h4><p><i>This section is non-normative.</i><p>The majority of presentational features from previous versions of + HTML are no longer allowed. Presentational markup in general has + been found to have a number of problems:<dl><dt>The use of presentational elements leads to poorer accessibility</dt> + + <dd> + + <p>While it is possible to use presentational markup in a way that + provides users of assistive technologies (ATs) with an acceptable + experience (e.g. using ARIA), doing so is significantly more + difficult than doing so when using semantically-appropriate + markup. Furthermore, even using such techniques doesn't help make + pages accessible for non-AT non-graphical users, such as users of + text-mode browsers.</p> + + <p>Using media-independent markup, on the other hand, provides an + easy way for documents to be authored in such a way that they work + for more users (e.g. text browsers).</p> + + </dd> + + + <dt>Higher cost of maintenance</dt> + + <dd> + + <p>It is significantly easier to maintain a site written in such a + way that the markup is style-independent. For example, changing + the colour of a site that uses + <code><font color=""></code> throughout requires changes + across the entire site, whereas a similar change to a site based + on CSS can be done by changing a single file.</p> + + </dd> + + + <dt>Higher document sizes</dt> + + <dd> + + <p>Presentational markup tends to be much more redundant, and thus + results in larger document sizes.</p> + + </dd> + + </dl><p>For those reasons, presentational markup has been removed from + HTML in this version. This change should not come as a surprise; + HTML4 deprecated presentational markup many years ago and provided a + mode (HTML4 Transitional) to help authors move away from + presentational markup; later, XHTML 1.1 went further and obsoleted + those features altogether.<p>The only remaining presentational markup features in HTML are the + <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute and the + <code><a href="#the-style-element">style</a></code> element. Use of the <code title="attr-style"><a href="#the-style-attribute">style</a></code> attribute is somewhat discouraged in + production environments, but it can be useful for rapid prototyping + (where its rules can be directly moved into a separate style sheet + later) and for providing specific styles in unusual cases where a + separate style sheet would be inconvenient. Similarly, the + <code><a href="#the-style-element">style</a></code> element can be useful in syndication or for + page-specific styles, but in general an external style sheet is + likely to be more convenient when the styles apply to multiple + pages.<p>It is also worth noting that four elements that were previously + presentational have been redefined in this specification to be + media-independent: <code><a href="#the-b-element">b</a></code>, <code><a href="#the-i-element">i</a></code>, <code><a href="#the-hr-element">hr</a></code>, + and <code><a href="#the-small-element">small</a></code>.<h4 id="syntax-errors"><span class="secno">1.9.2 </span>Syntax errors</h4><p><i>This section is non-normative.</i><p>The syntax of HTML is constrained to avoid a wide variety of + problems.<dl><dt>Unintuitive error-handling behavior</dt> + + <dd> + + <p>Certain invalid syntax constructs, when parsed, result in DOM + trees that are highly unintuitive.</p> + + <div class="example"> + + <p>For example, the following markup fragment results in a DOM + with an <code><a href="#the-hr-element">hr</a></code> element that is an <em>earlier</em> + sibling of the corresponding <code><a href="#the-table-element">table</a></code> element:</p> + + <pre class="bad"><table><hr>...</pre> + + </div> + + </dd> + + + <dt>Errors with optional error recovery</dt> + + <dd> + + <p>To allow user agents to be used in constrolled environments + without having to implement the more bizarre and convoluted error + handling rules, user agents are permitted to fail whenever + encountering a <a href="#parse-error">parse error</a>.</p> + + </dd> + + + <dt>Errors where the error-handling behavior is not compatible with streaming user agents</dt> + + <dd> + + <p>Some error-handling behavior, such as the behavior for the + <code title=""><table><hr>...</code> example mentioned + above, are incompatible with streaming user agents. To avoid + interoperability problems with such user agents, any syntax + resulting in such behavior is considered invalid.</p> + + </dd> + + + <dt>Errors that can result in infoset coercion</dt> + + <dd> + + <p>When a user agent based on XML is connected to an HTML parser, + it is possible that certain invariants that XML enforces, such as + comments never containing two consecutive hyphens, will be + violated by an HTML file. Handling this can require that the + parser coerce the HTML DOM into an XML-compatible infoset. Most + syntax constructs that require such handling are considered + invalid.</p> + + </dd> + + + <dt>Errors that result in disproportionally poor performance</dt> + + <dd> + + <p>Certain syntax constructs can result in disproportionally poor + performance. To discourage the use of such constructs, they are + typically made non-conforming.</p> + + <div class="example"> + + <p>For example, the following markup results in poor performance + when hitting the highlighted end tag, since all the open elements + are examined first to see if they match the close tag:</p> + + <pre class="bad"><p><em><span><span><span>...<span><span><span><strong></em></strong></pre> + + </div> + + </dd> + + + <dt>Errors that help authors avoid fragile syntax constructs</dt> + + <dd> + + <p>There are syntax constructs that, for historical reasons, are + relatively fragile. To help reduce the number of users who + accidentally run into such problems, they are made + non-conforming.</p> + + <div class="example"> + + <p>For example, the parsing of certain named character references + in attributes happens even with the closing semicolon being + omitted. It is safe to include an ampersand followed by letters + that do not form a named character reference, but if the letters + are changed to a string that <em>does</em> form a named character + reference, they will be interpreted as that character instead.</p> + + <p>In this fragment, the attribute's value is "<code title="">?hello=1&world=2</code>":</p> + + <pre class="bad"><a href="?hello=1&world=2">Demo</a></pre> + + <p>In the following fragment, however, the attribute's value is + actually "<code title="">?original=1©=2</code>", + <em>not</em> the intended "<code title="">?original=1&copy=2</code>":</p> + + <pre class="bad"><a href="?original=1&copy=2">Compare</a></pre> + + <p>To avoid this problem, all named character references are + required to end with a semicolon, and any ampersands followed by + letters are required to be escaped.</p> + + <p>Thus, the correct way to express the above cases is as + follows:</p> + + <pre><a href="?hello=1&amp;world=2">Demo</a></pre> + <pre><a href="?original=1&amp;copy=2">Compare</a></pre> + + </div> + + </dd> + + + <dt>Errors that flag known interoperability problems in legacy user agents</dt> + + <dd> + + <p>Certain syntax constructs are known to cause especially subtle + or serious problems in legacy user agents, and are therefore + marked as non-conforming to help authors avoid them.</p> + + <div class="example"> + + <p>For example, this is why the U+0060 GRAVE ACCENT character (`) + is not allowed in unquoted attributes. In certain legacy user + agents, <!-- namely IE --> it is sometimes treated as a quote + character.</p> + + </div> + + <div class="example"> + + <p>Another example of this is the DOCTYPE, which is required to + trigger <a href="#no-quirks-mode">no-quirks mode</a>, because the behavior of + legacy user agents in <a href="#quirks-mode">quirks mode</a> is often largely + undocumented.</p> + + </div> + + </dd> + + + <dt>Errors that protect authors from security attacks</dt> + + <dd> + + <p>Certain restrictions exist purely to avoid known security + problems.</p> + + <div class="example"> + + <p>For example, the restriction on using UTF-7 exists purely to + avoid authors falling prey to a known cross-site-scripting attack + using UTF-7.</p> + + </div> + + </dd> + + + <dt>Cases where the author's intent is unclear</dt> + + <dd> + + <p>Some errors merely flag cases where the author's intent is most + unclear. Correcting these errors early makes later maintenance easier.</p> + + <div class="example"> + + <p>For example, it is unclear whether the author intended the + following to be an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h1</a></code> heading or an <code><a href="#the-h1-h2-h3-h4-h5-and-h6-elements">h2</a></code> + heading:</p> + + <pre class="bad"><h1>Contact details</h2></pre> + + </div> + + </dd> + + + <dt>Cases that are likely to be typos</dt> + + <dd> + + <p>When a user makes a simple typo, it is helpful if the error can + be caught early, as this can save the author a lot of debugging + time. This specification therefore usually considers it an error + to use element names, attribute names, and so forth, that do not + match the names defined in this specification.</p> + + <div class="example"> + + <p>For example, if the author typed <code><capton></code> + instead of <code><caption></code>, this would be flagged as an + error and the author could correct the typo immediately.</p> + + </div> + + </dd> + + + <dt>Errors that allow for new syntax in future</dt> + + <dd> + + <p>In order to allow us to extend the language syntax in the + future, certain otherwise harmless features are disallowed.</p> + + <div class="example"> + + <p>For example, "attributes" in end tags are ignored currently, + but they are invalid, in case a future change to the language + makes use of that syntax feature without conflicting with + already-deployed (and valid!) content.</p> + + </div> + + </dd> + + + </dl><p>Some authors find it helpful to be in the practice of always + quoting all attributes and always including all optional tags, + preferring the consistency derived from such custom over the minor + benefits of terseness afforded by making use of the flexibility of + the HTML syntax. To aid such authors, conformance checkers can + provide modes of operation wherein such conventions are + enforced.<h4 id="restrictions-on-the-content-model-and-on-attribute-values"><span class="secno">1.9.3 </span>Restrictions on the content model and on attribute values</h4><p><i>This section is non-normative.</i><p>Beyond the syntax of the language, this specification also places + restrictions on how elements and attributes can be specified. These + restrictions are present for similar reasons:<dl><dt>Errors that flag content with dubious semantics</dt> + + <dd> + + <p>To avoid misuse of elements with defined meanings, content + models are defined that restrict how elements can be nested when + such nestings would be of dubious value.</p> + + <p class="example">For example, this specification disallows + nesting a <code><a href="#the-section-element">section</a></code> element inside a <code><a href="#the-kbd-element">kbd</a></code> + element, since it is highly unlikely for an author to indicate + that an entire section should be keyed in.</p> + + </dd> + + + <dt>Errors that indicate a conflict in expressed semantics</dt> + + <dd> + + <p>Similarly, to draw the author's attention to mistakes in the + use of elements, clear contradictions in the semantics expressed + are also considered conformance errors.</p> + + <div class="example"> + + <p>In the fragments below, for example, the semantics are + nonsensical: a row cannot simultaneously be a cell, nor can a + radio button be a progress bar.</p> + + <pre class="bad"><tr role="cell"></pre> + <pre class="bad"><input type=radio role=progressbar></pre> + + </div> + + </dd> + + + <dt>Errors that encourage a correct understanding of the spec</dt> + + <dd> + + <p>Sometimes, something is disallowed because allowing it would + likely cause author confusion.</p> + + <p class="example">For example, setting the <code title="attr-fe-disabled"><a href="#attr-fe-disabled">disabled</a></code> attribute to the value + "<code title="">false</code>" is disallowed, because despite the + appearance of meaning that the element is enabled, it in fact + means that the element is <em>disabled</em> (what matters for + implementations it the presence of the attribute, not its + value).</p> + + </dd> + + + <dt>Errors that are intended merely to simplify the language</dt> + + <dd> + + <p>Some conformance errors simplify the language that authors need + to learn.</p> + + <p class="example">For example, the <code><a href="#the-area-element">area</a></code> element's + <code title="attr-area-shape"><a href="#attr-area-shape">shape</a></code> attribute, despite + accepting both <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> and <code title="attr-area-shape-keyword-circle"><a href="#attr-area-shape-keyword-circle">circle</a></code> values in + practice as synonyms, disallows the use of the <code title="attr-area-shape-keyword-circ"><a href="#attr-area-shape-keyword-circ">circ</a></code> value, so as to + simplify tutorials and other learning aids. There would be no + benefit to allowing both, but it would cause extra confusion when + teaching the language.</p> + + </dd> + + + <dt>Errors that would likely result in scripts failing in hard-to-debug ways</dt> + + <dd> + + <p>Some errors are intended to help prevent script problems that + would be hard to debug.</p> + + <p class="example">This is why, for instance, it is non-conforming + to have two <code title="attr-id"><a href="#the-id-attribute">id</a></code> attributes with the + same value. Duplicate IDs lead to the wrong element being + selected, with sometimes disastrous effects whose cause is hard to + determine.</p> + + </dd> + + + <dt>Errors that are intended to save the author time</dt> + + <dd> + + <p>Some constructs are disallowed because historically they have + been the cause of a lot of wasted authoring time.</p> + + <p class="example">For example, a <code><a href="#script">script</a></code> element's + <code title="attr-script-src"><a href="#attr-script-src">src</a></code> attribute causes the + element's contents to be ignored. However, this isn't obvious, + especially if the element's contents appear to be executable + script — which can lead to authors spending a lot of time + trying to debug the inlien script without realising that it is not + executing. To reduce this problem, this specifications makes it + non-conforming to have executable script in a <code><a href="#script">script</a></code> + element when the <code title="attr-script-src"><a href="#attr-script-src">src</a></code> + attribute is present. This means that authors who are validating + their documents are less likely to waste time with this kind of + mistake.</p> + + </dd> + + + <dt>Errors that are intended to help authors of polyglot documents</dt> + + <dd> + + <p>Some authors like to write files that can be interpreted as + both XML and HTML with similar results. These are known as + polyglot documents. Though this practice is discouraged in general + due to the myriad of subtle complications involved (especially + when involving scripting, styling, or any kind of automated + serialization), this specification has a few restrictions intended + to at least somewhat mitigate the difficulties.</p> + + <p class="example">For example, there are somewhat complicated + rules surrounding the <code title="attr-lang"><a href="#attr-lang">lang</a></code> and + <code title="attr-xml-lang"><a href="#attr-xml-lang">xml:lang</a></code> attributes intended + to keep the two synchronized.</p> + + <p class="example">Another example would be the restrictions on + the values of <code title="">xmlns</code> attributes in the HTML + serialization, which are intended to ensure that elements in + conforming polyglot documents end up in the same namespaces + whether processed as HTML or XML.</p> + + </dd> + + + <dt>Errors that reserve space for future expansion</dt> + + <dd> + + <p>As with the restrictions on the syntax intended to allow for + new syntax in future revisions of the language, some restrictions + on the content models of elements and values of attributes are + intended to allow for future expansion of the HTML vocabulary.</p> + + <p class="example">For example, limiting the values of the <code title="attr-hyperlink-target"><a href="#attr-hyperlink-target">target</a></code> attribute that start + with an U+005F LOW LINE character (_) to only specific predefined + values allows new predefined values to be introduced at a future + time without conflicting with author-defined values.</p> + + </dd> + + + <dt>Errors that indicate a mis-use of other specifications</dt> + + <dd> + + <p>Certain restrictions are intended to support the restrictions + made by other specifications.</p> + + <p class="example">For example, requiring that attributes that + take media queries use only <em>valid</em> media queries + reinforces the importance of following the conformance rules of + that specification.</p> + + </dd> + + </dl><h3 id="recommended-reading"><span class="secno">1.10 </span>Recommended reading</h3><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><i>This section is non-normative.</i><p>The following documents might be of interest to readers of this specification.<dl><dt><cite>Character Model for the World Wide Web 1.0: Fundamentals</cite> <a href="#refsCHARMOD">[CHARMOD]</a></dt> <dd><blockquote><p>This Architectural Specification provides
Received on Saturday, 27 March 2010 04:44:43 UTC