HTML 4.0 comments

    Text indented 4 spaces is mine.  Text indented 8 spaces is quoted
    from the spec.  Unindented section headings provide context for the
    subsequent comments.

    Many of the comments point out typos.  Some point out confusing,
    misleading, or imprecise parts of the spec, and suggest
    clarifications or additions (unless I was baffled).

    Sorry I didn't look at the spec when it was still a Proposed
    Recommendation, but the semester just ended.

    AMC

2.1.3 Relative URIs

        Relative URIsare resolved to full URIs using a base URI.
                 ^^^^^^^

    Should be "URIs are".

3.3.3 Element declarations

        A few HTML element types use an additional SGML feature to
        exclude elements from content model.
                         ^^^^^^^^^^^^^^^^^^

    Should be "from a content model" or "from content models".

3.3.4 Attribute declarations

        In HTML, boolean attributes may be appear in minimized form --
                                        ^^

    Remove.

6.3 Text strings

        For introductory about attributes,

    Reword.

7.4.4 Meta data

        The meaning of a property and the set of legal values for
        that property should be defined in a reference lexicon
        called profile.
        ^^^^^^^^^^^^^^

    Should be "called a profile", right?

8.1 Specifying the language of content: the lang attribute

        <P><Q lang="en">"Her super-powers were the result of
                        ^

    Remove the quotation mark.

8.2.4 Overriding the bidirectional algorithm: the BDO element

        One reason for this may be that the MIME standard ([RFC2045],
        [RFC1556]) favors visual order, i.e., that right-to-left
        character sequences are inserted right-to-left in the byte
        stream.

    I don't think this means what was intended.  My best-effort
    interpretation of "right-to-left character sequences are inserted
    right-to-left in the byte stream" is that the rightmost character
    appears first in the byte stream.  But that is the opposite of RFC
    1556 visual directionality, which requires the leftmost character
    to appear first in the byte stream.  I strongly recommend using
    phrases like "leftmost character first" and avoiding phrases like
    "right-to-left in the byte stream", because byte streams do not have
    a left and right, only an earlier and later.

8.2.5 Character references for directionality and joining control

        Mirrored character glyphs. In general, the bidirectional
        algorithm does not mirror character glyphs but leaves them
        unaffected. An exception are characters such as parentheses (see
        [UNICODE], table 4-7).

    Although the Unicode character names and example glyphs are
    available online, the text of the spec is not, so I wish the HTML
    spec would elaborate a bit on the mirroring of parentheses.  If
    characters the characters ( and ) were called "open parenthesis" and
    "close parenthesis", I could understand why their appearance would
    depend on the directionality of the text.  But they're called "left
    parenthesis" and "right parenthesis", so I don't see why they would
    ever be mirrored.  In right-to-left text, you would obviously begin
    a parenthetical with a right parenthesis, and end it with a left
    parenthesis, correct?

9.1 White space

        authors should not rely on user agents to render white space
        immediately after a start tag or immediately before an end tag.

    What about the converse?  Should authors also not rely on user
    agents *not* to render whitespace immediately after a start tag?
    For example, may authors assume that these will be rendered the
    same:

    <li>foo
    <li> foo

    or should authors always use the first form?

9.2.1 Phrase elements: EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE,
      ABBR, and ACRONYM

    The HTML 2.0 spec contains more description and examples for these
    elements.  I think they should have been retained.

        <ABBR lang="es" title="Do&ntilde;a">Do&ntilde;a</ABBR>

    The title is identical to the content.

9.3.4 Preformatted text: The PRE element

        width = number [CN] 
            This attribute provides a hint to visual user agents about
            the desired width of the formatted block.

    By definition, preformatted text already has a width, which can be
    determined by scanning it and noticing the length of the longest
    line.  Maybe you mean that this attribute provides a hint, not about
    the width of the text, but about the width of the window for which
    the text was formatted.

        When handling preformatted text, visual user agents:
            May leave white space intact. 
            May render text with a fixed-pitch font. 
            May disable automatic word wrap. 

    Shouldn't each "may" be "should"?  Authors usually depend on these
    for vertical alignment.

11.2.4 Column groups: the COLGROUP and COL elements 

        The table in this example contains six columns. The first one
        does not belong to an explicit column group.

    But later:

        <TABLE>
        <COLGROUP>
           <COL width="30">
        <COLGROUP>
           <COL width="30">
           <COL width="0*">
           <COL width="2*">
        <COLGROUP align="center">
           <COL width="1*">
           <COL width="3*" align="char" char=":">
        <THEAD>
        <TR><TD> ...
        ...rows...
        </TABLE>

    And then:

        We have set the value of the align attribute in the second
        column group to "center".

    It looks like the text and the example do not agree.

11.4.2 Categorizing cells

        In order to determine, for example, the costs of meals on 25
        August, the user agent must know which table cells refer to
        "Meals" (all of them)
        ^^^^^^^^^^^^^^^^^^^^^

    No, only cells in the Meals column refer to meals.  Maybe you meant
    "which table cells refer to "Expenses" (specifically, Meals)".

12.1.1 Visiting a linked resource

        Note that the hrefattribute in each source anchor
                      ^^^^^^^^^^^^^

    Insert a space.

12.1.2 Other link relationships

        Links that express other types of relationships have one or more
        link type specified in their source anchor.
                ^^                               ^^

    These nouns should be plural.

13.2 Including an image: the IMG element

        User agents must render alternate next when they cannot support
                                          ^^^^

    Should be "text".

13.3.2 Object initialization: the PARAM element

        Any number of PARAM elements may appear in the content of an
        OBJECT or APPLETelement,
                       ^^

    Insert a space.

13.3.4 Object declarations and instantiations

        <P><OBJECT declare id="tribune" ...
        <PARAM name="font" valuetype="object" value="#tribune">

    Is the pound sign supposed to be there?  Section 13.3.2 said:

        object: The value specified by value is an identifier that
        refers to an OBJECT declaration in the same document. The
        identifier must be the value of the id attribute set for the
        declared OBJECT element.

    That suggests to me that the PARAM element should have
    value="tribune", with no pound sign.

13.6.1 Client-side image maps: the MAP and AREA elements

        usemap = uri [CT] 
            This attribute associates an image map with an element. The
            image map is defined by a MAP element. The value of usemap
            must match the value of the name attribute of the associated
            MAP element.

    Since the value of the usemap attribute is a URI, it should be
    permissible to refer to a MAP element from another document.  None
    of the examples do this.  Is it allowed?

    By the way, the idea of allowing the shape and coords attributes in
    A elements is brilliant!

13.7 Visual presentation of images, objects, and applets

        All IMG and OBJECT attributes that concern visual alignment and
        presentation have been deprecated in favor of style sheets.

    This is imprecise.  Some of the attributes mentioned in 13.7 are not
    deprecated (width, height), some of them are deprecated but don't
    say so (vspace, hspace, align), and some of them say that they're
    deprecated (border).  I suggest removing the above sentence and
    inserting explicit "deprecated" indications wherever appropriate.

14.2.3 Header style information: the STYLE element

    The title attribute appears in the DTD but is not mentioned in the
    text.  Later, in section 14.4, there is an example of the title
    attribute of a LINK element, but not of a STYLE element.  This
    leaves the reader unconfident about the use of the title attribute
    with the STYLE element.

14.3.2 Specifying external style sheets

        For example, to set the preferred style sheet to "compact" (see
        the preceding example),

    Actually, the previous example used "Compact", and the title
    attribute is case sensitive.  Since the subsequent examples use
    "compact", perhaps the first one should be changed to match.

17.3 The FORM element

        The value is a space- and/or comma-delimited list of charset
        values.

        This attribute specifies a comma-separated list of content types

    Throughout the spec, some attribute values are space-separated, some
    are comma-separated, and some are space- and/or comma-separated.  Is
    there a simple rule that one can memorize, rather than consulting
    the spec every time?  If so, this rule should be stated somewhere.

17.4 The INPUT element

        readonly    (readonly)     #IMPLIED  -- for text and passwd --
                                                             ^^^^^^

    Should be "password" (in the actual DTD too).

17.10 Adding structure to forms: the FIELDSET and LEGEND elements

        /samp

    This must be a typo at the very end of the section.

17.11.2 Access keys

        accesskey = character [CN]

    How is this case neutral?  Doesn't it have to be either case
    sensitive or case insensitive?  Am I allowed to have one control
    with an accesskey of "C" and another with an access key of "c"?  (I
    vote no.)

    By the way, shouldn't the spec say that no two controls in the same
    document should have the same accesskey?

        We recommend that authors include the access key in label text
        or wherever the access key is to apply. User agents should
        render the value of an access key in such a way as to emphasize
        its role and to distinguish it from other characters (e.g., by
        underlining it).

    I think this should be more precise.  Maybe you mean:

    We recommend that authors include the access key in the contents
    of the A, AREA, BUTTON, LABEL, or LEGEND element, or in the value
    attribute of the INPUT element of type submit, reset, or button.
    User agents should render the first occurrence of the access key
    (using case-insensitive matching) in such a way as to emphasize
    its role and to distinguish it from other characters (e.g., by
    underlining it).

17.12.2 Read-only controls

        The following elements support the readonly attribute: INPUT,
        TEXT, PASSWORD, and TEXTAREA.

    There are no such elements as TEXT and PASSWORD.  You probably mean
    INPUT elements of type text and password.  I don't know whether you
    mean to include all other types of INPUT as well.

17.13.4 Form content types

        1. Control names and values are escaped. Space characters are
           replaced by `+', and then reserved characters are escaped
           as described in [RFC1738], section 2.2: Non-alphanumeric
           characters are replaced by `%HH', a percent sign and two
           hexadecimal digits representing the ASCII code of the
           character. Line breaks are represented as "CR LF" pairs
           (i.e., `%0D%0A').

    This was lifted almost verbatim from the HTML 2.0 spec, but changing
    "escaped: space" to "escaped. Space" adds confusion (by making the
    first sentence seem like a separate step), as does removing the
    "that is," before "non-alphanumeric" (making that sentence seem like
    a separate step).

        The file name may be specified with the "filename" parameter of
        the 'Content-Disposition: form-data' header, or, in the case of
        multiple files, in a 'Content-Disposition: file' header of the
        subpart.

    The examples use 'Content-Disposition: attachment' in the subparts,
    rather than 'Content-Disposition: file'.  Are both correct?  Is one
    preferred?

18.2.2 Specifying the scripting language

        It is also possible to specify the scripting language in each
        SCRIPT element via the type attribute.  In the absence of a
        default scripting language specification, this attribute must be
        set on each SCRIPT element.

    This makes it sound like the type attribute is optional on SCRIPT
    elements, but the DTD says it's required.

        a name attribute takes precedence over a id if both are set.
                                               ^^^^

    Should be "an id".

24.2.1 The list of characters

        <!ENTITY not    CDATA "&#172;" -- not sign = discretionary hyphen,
                                                  ^^^^^^^^^^^^^^^^^^^^^^^

    I suspect that's not supposed to be there.  It should be removed in
    HTMLlat1.ent too.

24.3.1 The list of characters

    It would be very nice if the comment for each entity included the
    Adobe standard glyph name, since this list of entities was taken
    directly from the Adobe Symbol font.  Each glyph name begins with a
    slash.  I think the mapping is given here:

    http://www.ams.org/html-math/tr9573-symbols.html

    But that page doesn't state explicitly that the slash-names are the
    Adobe standard glyph names.  The Adobe PostScript reference manual
    would be the authoritative source.

        <!ENTITY weierp   CDATA "&#8472;" -- script capital P = power set
                                        = Weierstrass p, U+2118 ISOamso -->

    Is that considered a good mapping, or a compromise?  I once looked
    for a Unicode character matching this Symbol font glyph, and was not
    satisfied with anything I found.  If this is a compromise, there
    should be a disclaimer to that effect.

24.4 Character entity references for markup-significant and
     internationalization characters

        Entities have also been added for the remaining characters
        occurring in CP-1252 which do not occur in the HTMLlat1 or
        HTMLsymbol entity sets. These all occur in the 128 to 159 range
        within the cp-1252 charset.

    What is CP-1252?  It doesn't seem to be defined or referenced
    anywhere.  Also, either capitalize the second occurrence or
    decapitalize the first.

Appendix A: Changes between HTML 3.2 and HTML 4.0

    This appendix neglects to mention that the HTML 3.2 DTD allowed
    %text in the content of BODY, but the HTML 4.0 DTD does not allow
    %inline in the content of BODY.  I think that's a noteworthy change.

A.3 Changes for accessibility

        (see the longdesc attribute).

    For some reason, "longdesc" is not a link in the hypertext spec, but
    should be.

A.4 Changes for meta data

        Authors may now specify profiles that provide explanations about
        meta specified with the META or LINK elements.
        ^^^^

    Should be "meta data".

A.9 Changes for forms

        The readonly, allows authors to prohibit changes
            ^^^^^^^^^

    Should be "readonly attribute".

Appendix B: Performance, Implementation, and Design Notes

        Despite the appearance of words such as "must" and "should",
        all requirements in this section appear elsewhere in the
        specification.

    Is that true of the requirement that "a line break immediately
    following a start tag must be ignored, as must a line break
    immediately before an end tag" (B.3.1 Line breaks)?

B.3.2 Specifying non-HTML data

        Authors should therefore escape sequences "</" sequence within
        the content.

    Reword.

B.4 Notes on helping search engines index your Web site

        You may help search engines by using the LINK element with
        rel="begin" along with a TITLE, as in:

    The section on link types recommended using rel=Start for this
    purpose.  Should authors use one, or the other, or both?  Also, I
    think you meant "title" (the attribute), not TITLE (the element).

        The list of terms in the content is ALL, INDEX, NOFOLLOW,
        NOINDEX. The name and the content attribute values are
        case-insensitive.

    This description is very incomplete, and leaves the reader with a
    lot of uncertainty.  Brief but complete documentation can be found
    here:

    http://info.webcrawler.com/mak/projects/robots/meta-user.html

    By the way, both that page and a more complete and precise
    specification of the robots.txt file are linked from:

    http://info.webcrawler.com/mak/projects/robots/exclusion.html

    You might want to have a reference to that page.

B.5.1 Design rationale

        This can be altered by setting the width-TABLE attribute of the
        TABLE element.                     ^^^^^^^^^^^

    Should be "width".

B.5.2 Recommended Layout Algorithms

        Rules for handling objects too large for column apply when the
        explicit or implied alignment results in a situation where the
        data exceeds the assigned width of the column.

    "for column" should be "for a column".  Which rules are being
    referred to here?

        The values for theframe attribute have been chosen to avoid
        clashes with the rules, align and valign-COLGROUP attributes.

    "theframe" should be "the frame", and "valign-COLGROUP" should be
    "valign".

Received on Sunday, 28 December 1997 02:24:35 UTC