W3C home > Mailing lists > Public > www-archive@w3.org > January 2014

Fwd: note: HTML 4.01 analysis for conformance

From: Karl Dubost <karl@la-grange.net>
Date: Tue, 21 Jan 2014 19:26:43 +0900
Message-Id: <A17DD4E2-07BA-46BB-9CCC-76FF355D7FB8@la-grange.net>
Cc: Robin Berjon <robin@w3.org>
To: www-archive Archive <www-archive@w3.org>
on www-archive to entertain Robin Berjon.

Début du message réexpédié :
> De: Karl Dubost <karl@****>
> Objet: note: HTML 4.01 analysis for conformance
> Date: 2003-03-22
> Message-Id: <a05200f00baa1144d196f@[192.168.2.1]>
> 
> Hi,
> 
> I have re-read the full HTML 4.01 Specification and the Erratas.
> http://www.w3.org/TR/1999/REC-html401-19991224/
> 
> 
> * [4 Conformance: requirements and recommendations|http://www.w3.org/TR/html4.01/conform.html]
> 
> """The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.
> 
> At times, the authors of this specification recommend good practice for authors and user agents. These recommendations are not normative and conformance with this specification does not depend on their realization. These recommendations contain the expression "We recommend ...", "This specification recommends ...", or some similar wording."""
> 
> * MUST, MUST NOT, REQUIRED
> 
> 4.01 User agents must not render SGML processing instructions (e.g., <?full volume>) or comments.
> 
> 5.01 User agents must also know the specific character encoding that was used to transform the document character stream into a byte stream.
> 
> 5.02 This specification does not mandate which character encodings a user agent must support.
> 
> 5.03 Conforming user agents must correctly map to ISO 10646 all characters in any character encodings that they recognize (or they must behave as if they did).
> 
> 5.04 User agents must not assume any default value for the "charset" parameter.
> 
> 5.05 The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed).
> 
> 5.06 To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
> 
>    1. An HTTP "charset" parameter in a "Content-Type" field.
>    2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
>    3. The charset attribute set on an element that designates an external resource.
> 
> 6.01 Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is.
> 
> 6.02 ID and NAME tokens must begin with a letter ([A-Za-z])
> 
> 6.03 NUMBER tokens must contain at least one digit ([0-9]).
> 
> 6.04 The "charset" attributes (%Charset in the DTD) refer to a character encoding as described in the section on character encodings. Values must be strings (e.g., "euc-jp") from the IANA registry (see [CHARSETS] for a complete list).
> 
> 6.05 User agents must follow the steps set out in the section on specifying character encodings in order to determine the character encoding of an external resource. (Cf. 5.06)
> 
> 6.06 (Dates and times YYYY-MM-DDThh:mm:ssTZD) Z  indicates UTC (Coordinated Universal Time). The "Z" must be uppercase.
> 
> 6.07 (Dates and times) Exactly the components shown here must be present, with exactly this punctuation. Note that the "T" appears literally in the string (it must be uppercase), to indicate the beginning of the time element, as specified in [ISO8601]
> 
> 6.08 (Media Descriptors) To facilitate the introduction of these extensions, conforming user agents must be able to parse the media attribute value as follows:
> 1. The value is a comma-separated list of entries. 
>      2. Each entry is truncated just before the first character that isn't a US ASCII letter [a-zA-Z] (ISO 10646 hex 41-5a, 61-7a), digit [0-9] (hex 30-39), or hyphen (hex 2d).
>      3. A case-sensitive match is then made with the set of media types defined above. User agents may ignore entries that don't match.
> 
> 6.09 (Scripts) User agents must not evaluate script data as HTML markup but instead must pass it on as data to a script engine.
> 
> 6.10 (Stylesheet) User agents must not evaluate style data as HTML markup.
> 
> 6.11 (Frames) Except for the reserved names listed below, frame target names (%FrameTarget; in the DTD) must begin with an alphabetic character (a-zA-Z).  -> reserved list (_blank, _self, _parent, _top)
> 7.01 HTML 4.01 specifies three DTDs, so authors must include one of the following document type declarations in their documents.
> 
> 7.02 Every HTML document must have a TITLE element in the HEAD section.
> 
> 7.03 For reasons of accessibility, user agents must always make the content of the TITLE element available to users (including TITLE elements that occur in frames).
> 
> 7.04 (about id="name") This name must be unique in a document.
> 
> 7.05 (about class="cdata-list") Multiple class names must be separated by white space characters.
> 
> 7.06 Authors may also choose to use a system identifier that refers to a specific (dated) version of an HTML 4 DTD when validation to that particular DTD is required.
> 
> 7.07 Exactly one title is required per document.
> 
> 7.08 User agents are not required to support meta data mechanisms. (META element) 
>       [Karl: Contradictory with the charset def]
> 
> 8.01 user agents must make a best attempt to render all characters, regardless of the value specified by lang. [8.1]
> 
> 8.02 User agents must make a best attempt to render [gamma character] even though it is not an English character. [8.1]
>    [Karl: Use of a must in an example]
> 
> 8.03 [RFC1766] defines and explains the language codes that must be used in HTML documents. [8.1.1]
> 
> 8.04  If a document contains right-to-left characters, and if the user agent displays these characters, the user agent must use the bidirectional algorithm. [8.2]
> 
> 8.05 User agents must not use the lang attribute to determine text directionality. [8.2]
> 
> 8.06 To achieve additional levels of embedded direction changes, you must make use of the dir attribute on an inline element. [8.2.3]
> 
> 8.07 To achieve two embedded direction changes, we must supply additional information, which we do by delimiting the second embedding explicitly. [8.2.3]
> 
> 8.08 Because HTML uses the Unicode bidirectionality algorithm, conforming documents encoded using ISO 8859-8 must be labeled as "ISO-8859-8-i". [8.2.4]
> 
> 8.09 However, because the bidirectional algorithm relies on the inline/block-level distinction, special care must be taken during the transformation. [8.2.6]
> 
> 8.10 The BDO element should be used in scenarios where absolute control over sequence order is required (e.g., multi-language part numbers). [8.2.4]
> 
> 8.11 If a document does not contain a displayable right-to-left character, a conforming user agent is not required to apply the [UNICODE] bidirectional algorithm. [8.2]
> 
> 9.01 Visual user agents must ensure that the content of the Q element is rendered with delimiting quotation marks. [9.2.2]
> 
> 9.02 A number of issues, both stylistic and technical, must be addressed:
> 
>     * Treatment of white space
>     * Line breaking and word wrapping
>     * Justification
>     * Hyphenation
>     * Written language conventions and text directionality
>     * Formatting of paragraphs with respect to surrounding content [9.3]
> 
> 9.03 Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. [9.3.3]
> 
> 9.04 When handling preformatted text, visual user agents: Must not disable bidirectional processing. [9.3.4]
> 
> 9.05 The INS and DEL elements must not contain block-level content when these elements behave as inline elements. [9.4]
> 
> 9.06 Non-visual user agents are not required to respect extra white space in the content of a PRE element. [9.3.4]
> 
> 10.01 All lists must contain one or more list elements. [10.1]
> 
> 11.01 User agents must know where to render the header and footer. [11.2.1]
> 
> 11.02 In order for a user agent to format a table in one pass, authors must tell the user agent:
> 
>     * The number of columns in the table. Please consult the section on calculating the number of columns in a table for details on how to supply this information.
>     * The widths of these columns. Please consult the section on calculating the width of columns for details on how to supply this information. [11.2.1]
> 
> 11.03 If any of the columns are specified in relative or percentage terms (see the section on calculating the width of columns), authors must also specify the width of the table itself. [11.2.1]
> 11.04 Each row group must contain at least one row, defined by the TR element. [11.2.3]
> 
> 11.05 TFOOT must appear before TBODY within a TABLE definition so that user agents can render the foot before receiving all of the (potentially numerous) rows of data. [11.2.3]
> 
> 11.06 The following summarizes which tags are required and [...]:
> 
>     * The TBODY start tag is always required except when the table contains only one table body and no table head or foot sections. [...]
>     * The start tags for THEAD and TFOOT are required when the table head and foot sections are present respectively, [...]
> 
> Conforming user agent parsers must obey these rules for reasons of backward compatibility. [11.2.3]
> 
> 11.07 The THEAD, TFOOT, and TBODY sections must contain the same number of columns. [11.2.3]
> 
> 11.08 (about span="number" in the colgroup element) This attribute, which must be an integer > 0, specifies the number of columns in a column group.  [11.2.4]
> 
> 11.09 (about span="number" in the colgroup element) User agents must ignore this attribute if the COLGROUP element contains one or more COL elements. [11.2.4]
> 
> 11.10 (about width="multi-length" in the colgroup element) This implies that a column's entire contents must be known before its width may be correctly computed.
> 
> 11.11 When it is necessary to single out a column (e.g., for style information, to specify width information, etc.) within a group, authors must identify that column with a COL element. [11.2.4]
> 
> 11.12 (about span="number" in the col element) This attribute, whose value must be an integer > 0, specifies the number of columns "spanned" by the COL element; the COL element shares its attributes with all the columns it spans.   [11.2.4]
> 
> 11.13 However, if the table does not have a fixed width, user agents must receive all table data before they can determine the horizontal space required by the table. [11.2.4]
> 
> 11.14 (about the attribute headers="idrefs" in th and td elements) The value of this attribute is a space-separated list of cell names; those cells must be named by setting their id attribute. [11.2.6]
> 
> 11.15 (about the attribute scope="scope-name" in th and td elements) When specified, this attribute must have one of the following values:
> 
>     * row: The current cell provides header information for the rest of the row that contains it (see also the section on table directionality).
>     * col: The current cell provides header information for the rest of the column that contains it.
>     * rowgroup: The header cell provides header information for the rest of the row group that contains it.
>     * colgroup: The header cell provides header information for the rest of the column group that contains it. [11.2.6]
> 
> 11.16 User agents must render either the contents of the cell or the value of the abbr attribute. [11.2.6]
> 
> 11.17 For a given data cell, the headers attribute lists which cells provide pertinent header information. For this purpose, each header cell must be named using the id attribute. [11.4.1]
> 
> 11.18 (about char="character" attribute about horizontal alignment) User agents are not required to support this attribute. [11.3.2]
> 
> 11.19 (about charoff="length" attribute about horizontal alignment) User agents are not required to support this attribute. [11.3.2] 
> 
> 11.20 If a table or given column has a fixed width, cellspacing and cellpadding may demand more space than assigned. User agents may give these attributes precedence over the width attribute when a conflict occurs, but are not required to. [11.3.3]
> 
> 12.01 The destination anchor must be given an anchor name and any URI addressing this anchor must include the name as its fragment identifier. [12.1.1]
> 
> 12.02 (about name="cdata" in A element) The value of this attribute must be a unique anchor name. [12.2]
> 
> 12.03 Anchor names must observe the following rules:
> 
>     * Uniqueness: Anchor names must be unique within a document. Anchor names that differ only in case may not appear in the same document.
>     * String matching: Comparisons between fragment identifiers and anchor names must be done by exact (case-sensitive) match. [12.2.1]
> 12.04 Links and anchors defined by the A element must not be nested; an A element must not contain any other A elements. [12.2.2]
> 
> 12.05 (about id and name attributes) When both attributes are used on a single element, their values must be identical. [12.2.3]
> 
> 12.06 When present, the BASE element must appear in the HEAD section of an HTML document, before any element that refers to an external source.  [12.4]
> 
> 12.07 User agents must calculate the base URI for resolving relative URIs according to [RFC1808], section 3. [12.4.1]
> 
> 12.08 User agents must calculate the base URI according to the following precedences (highest priority to lowest):
> 
>    1. The base URI is set by the BASE element.
>    2. The base URI is given by meta data discovered during a protocol interaction, such as an HTTP header (see [RFC2616]).
>    3. By default, the base URI is that of the current document. Not all HTML documents have a base URI (e.g., a valid HTML document may appear in an email and may not be designated by a URI). Such HTML documents are considered erroneous if they contain relative URIs and rely on a default base URI. [12.4.1]
> 
> 13.01 (about longdesc="uri" in img element) Since an IMG element may be within the content of an A element, the user agent's mechanism in the user interface for accessing the "longdesc" resource of the former must be different than the mechanism for accessing the href resource of the latter. [13.2]
> 
> 13.02 User agents must render alternate text when they cannot support images, they cannot support a certain image type or when they are configured not to display images. [13.2]
> 
> 13.03 (about object element) declare [CI]
>     When present, this boolean attribute makes the current OBJECT definition a declaration only. The object must be instantiated by a subsequent OBJECT definition referring to this declaration.  [13.3]
> 
> 13.04 (about object element) In the most general case, an author may need to specify three types of information:
> 
>     * The implementation of the included object. For instance, if the included object is a clock applet, the author must indicate the location of the applet's executable code.
>     * The data to be rendered. For instance, if the included object is a program that renders font data, the author must indicate the location of that data. [13.3]
> 
> 13.05 A user agent must interpret an OBJECT element according to the following precedence rules:
> 
>    1. The user agent must first try to render the object. It should not render the element's contents, but it must examine them in case the element contains any direct children that are PARAM elements (see object initialization) or MAP elements (see client-side image maps).
>    2. If the user agent is not able to render the object for whatever reason (configured not to, lack of resources, wrong architecture, etc.), it must try to render its contents. [13.3.1]
> 
> 13.06 (about attribute valuetype=data|ref|object in param element) ref: The value specified by value is a URI that designates a resource where run-time values are stored. This allows support tools to identify URIs given as parameters. The URI must be passed to the object as is, i.e., unresolved. 
> object: The value specified by value is an identifier that refers to an OBJECT declaration in the same document. The identifier must be the value of the id attribute set for the declared OBJECT element. [13.3.2]
> 
> 13.07 Any number of PARAM elements may appear in the content of an OBJECT or APPLET element, in any order, but must be placed at the start of the content of the enclosing OBJECT or APPLET element. [13.3.2]
> 
> 13.08 When an OBJECT element is rendered, user agents must search the content for only those PARAM elements that are direct children and "feed" them to the OBJECT. [13.3.2]
> 
> 13.09 To declare an object so that it is not executed when read by the user agent, set the boolean declare attribute in the OBJECT element. At the same time, authors must identify the declaration by setting the id attribute in the OBJECT element to a unique value. Later instantiations of the object will refer to this identifier. [13.3.4]
> 
> 13.10 A declared OBJECT must appear in a document before the first instance of that OBJECT. [13.3.4]
> 
> 13.11 User agents that don't support the declare attribute must render the contents of the OBJECT declaration. [13.3.4]
> 
> 13.12 (about attributes code or object in APPLET element) Either code or object must be present. If both code and object are given, it is an error if they provide different class names. [13.4]
> 
> 13.13 The content of the APPLET acts as alternate information for user agents that don't support this element or are currently configured not to support applets. User agents must ignore the content otherwise. [13.4]
> 
> 13.14 Recall that the contents of OBJECT must only be rendered if the file specified by the data attribute cannot be loaded. [13.5]
> 
> 13.15 (about attribute usemap) The value of usemap must match the value of the name attribute of the associated MAP element. [13.6.1]
> 
> 13.16 Therefore, authors must provide alternate text for each AREA with the alt attribute (see below for information on how to specify alternate text). [13.6.1]
> 
> 13.17 When a MAP element contains mixed content (both AREA elements and block-level content), user agents must ignore the AREA elements. [13.6.1]
> 
> 13.18 It is only possible to define a server-side image map for the IMG and INPUT elements. In the case of IMG, the IMG must be inside an A element and the boolean attribute ismap ([CI]) must be set. In the case of INPUT, the INPUT must be of type "image". [13.6.2]
> 
> 13.19 The alt attribute must be specified for the IMG and AREA elements. [13.8]
> 
> 13.20 While alternate text may be very helpful, it must be handled with care.  [13.8]
> 14.01 Authors must specify the style sheet language of style information associated with an HTML document. [14.2.1]
> 
> 14.02 (about type="content-type" attribute in style element) Authors must supply a value for this attribute; there is no default value for this attribute. [14.2.3]
> 
> 14.03 User agents that don't support style sheets, or don't support the specific style sheet language used by a STYLE element, must hide the contents of the STYLE element. [14.2.3]
> 
> 14.04 When a user selects a named style, the user agent must apply all style sheets with that name. [14.3.1]
> 
> 14.05 User agents must not apply alternate style sheets with a different style name. [14.3.1]
> 
> 14.06 Authors may also specify persistent style sheets that user agents must apply in addition to any alternate style sheet. [14.3.1]
> 
> 14.07 User agents must respect media descriptors when applying any style sheet. [14.3.1]
> 
> 14.08 User agents should also allow users to disable the author's style sheets entirely, in which case the user agent must not apply any persistent or alternate style sheets. [14.3.1]
> 
> 15.01 Font style elements must be properly nested. [15.2.1]
> 
> 16.01 Elements that might normally be placed in the BODY element must not appear before the first FRAMESET element or the FRAMESET will be ignored.  [16.2]
> 
> 16.02 (about noresize attribute in frame element)  When present, this boolean attribute tells the user agent that the frame window must not be resizeable. [16.2.2]
> 
> 16.03 (about marginwidth="pixels" attribute in frame element)  The value must be an integer greater than or equal to zero. (pixels). [16.2.2]
> 
> 16.04 (about marginheight="pixels" attribute in frame element)  The value must be an integer greater than or equal to zero. (pixels). [16.2.2]
> 
> 16.05 The contents of a frame must not be in the same document as the frame's definition. [16.2.2]
> 
> 16.06 User agents that support frames must only display the contents of a NOFRAMES declaration when configured not to display frames. [16.4.1]
> 
> 16.07 User agents that do not support frames must display the contents of NOFRAMES in any case. [16.4.1]
> 
> 17.01 (about accept-charset="charset list" in form element) The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received. [17.3]
> 
> 17.02 (The FORM element acts as a container for controls.) The receiving program must be able to parse name/value pairs in order to make use of them. [17.3]
> 
> 17.03 (The FORM element acts as a container for controls.) A character encoding that must be accepted by the server in order to handle this form (the accept-charset attribute). [17.3]
> 
> 17.04 Please consult the section on form submission for information about how user agents must prepare form data for servers [17.3]
> 
> 17.05 (about checked attribute in input element) User agents must ignore this attribute for other control types. [17.4]
> 
> 17.06 Recall that authors must provide alternate text for an IMG element. [17.5]
> 
> 17.07 A SELECT element must contain at least one OPTION element. [17.6]
> 
> 17.08 The OPTGROUP element allows authors to group choices logically. This is particularly helpful when the user must choose from a long list of options; groups of related choices are easier to grasp and remember than a single long list of options. In HTML 4, all OPTGROUP elements must be specified directly within a SELECT element (i.e., groups may not be nested). [17.6]
> 
> 17.09 (about for="idref" attribute in label element) When present, the value of this attribute must be the same as the value of the id attribute of some other control in the same document. [17.9.1]
> 
> 17.10 The for attribute associates a label with another control explicitly: the value of the for attribute must be the same as the value of the id attribute of the associated control element. [17.9.1]
> 
> 17.11 To associate a label with another control implicitly, the control element must be within the contents of the LABEL element. [17.9.1]
> 
> 17.12 In an HTML document, an element must receive focus from the user in order to become active and perform its tasks. For example, users must activate a link specified by the A element in order to follow the specified link. Similarly, users must give a TEXTAREA focus in order to enter text into it. [17.11]
>         [Karl: examples? mandatory?]
> 
> 17.13 (about tabindex="number" attribute) This value must be a number between 0 and 32767. [17.11.1]
> 
> 17.14 (about tabindex="number" attribute) Values need not be sequential nor must they begin with any particular value. [17.11.1]
> 
> 17.15 Similarly, an author may want to include a piece of read-only text that must be submitted as a value along with the form. [17.12]
> 
> 17.16 A successful control must be defined within a FORM element and must have a control name. [17.13.2]
> 
> 17.17 HTML 4 user agents must support the established conventions in the following cases:
> 
>     * If the method is "get" and the action is an HTTP URI, the user agent takes the value of action, appends a `?' to it, then appends the form data set, encoded using the "application/x-www-form-urlencoded" content type. The user agent then traverses the link to this URI. In this scenario, form data are restricted to ASCII codes.
>     * If the method is "post" and the action is an HTTP URI, the user agent conducts an HTTP "post" transaction using the value of the action attribute and a message created according to the content type specified by the enctype attribute. [17.13.3]
> 
> 17.18 User agents must support the content types listed below. (application/x-www-form-urlencoded, multipart/form-data) [17.13.4]
> 
> 17.19 Forms submitted with this content type (application/x-www-form-urlencoded) must be encoded as follows:
> 
>    1. Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
>    2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'. [17.13.4]
> 
> 17.20 (about INPUT element) attribute name required for all but submit and reset [17.4]
>  
> 17.21 Visual user agents are not required to present a SELECT element as a list box; they may use any other mechanism, such as a drop-down menu. [17.6]
> 
> 17.22 If a control doesn't have a current value when the form is submitted, user agents are not required to treat it as a successful control. [17.13.2]
> 
> 18.01 (about type="content-type" in script element) Authors must supply a value for this attribute. [18.2.1]
> 
> 18.02 If the src attribute is not set, user agents must interpret the contents of the element as the script. [18.2.1]
> 
> 18.03 If the src has a URI value, user agents must ignore the element's contents and retrieve the script via the URI. [18.2.1]
> 
> 18.04 Scripts are evaluated by script engines that must be known to a user agent. [18.2.1]
> 
> 18.05 As HTML does not rely on a specific scripting language, document authors must explicitly tell user agents the language of each script. [18.2.2]
> 
> 18.06 The type attribute must be specified for each SCRIPT element instance in a document. [18.2.2]
> 
> 18.07 (about NOSCRIPT element) User agents that do not support client-side scripts must render this element's contents. [18.3.1]
> 
> 18.08 User agents may still attempt to interpret incorrectly specified scripts but are not required to. [18.2.2]
> 
> 

-- 
Karl Dubost 🐄
http://www.la-grange.net/karl/
Received on Tuesday, 21 January 2014 10:26:50 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:44:28 UTC