comments on HTML 4.0 draft

    Throughout this document, URLs with zero indentation indicate the
    context of subsequent comments.  Text indented 8 spaces is quoted
    material from those pages, and text indented 4 spaces is commentary
    on the *preceeding* quoted material.

    The topics of the comments range all the way from typos to design
    decisions.

http://www.w3.org/TR/WD-html40/struct/dirlang.html

        Thus, if the lang attribute value of "en-US" is set for the
        HTML element, a user agent should prefer style information that
        matches "en-US" first, then the more general value "US".

    Instead of "US", I think you mean "en".

        For artificial languages such as Elfish or Klingon, it would
        make sense to use the lang attribute to indicate the change from
        the language of the enclosing context. Until the successor to
        [RFC1766] defines a standard way to do this, one possibility is
        to use the x- prefix convention, e.g. x-elfish.

    Instead of "Elfish", I think you mean "Elvish".  Instead of
    "x-elfish", I'm sure people would use "x-sindarin" or "x-quenya".
    (I know of no languages ascribed to elfs, but Tolkien defined
    two languages spoken by his Elves.  To avoid confusing readers
    unfamiliar with Tolkien, it might be safer to say "x-klingon".)

        If a document does not contain a displayable right-to-left,
        a conforming user agent is not required to apply the
        [UNICODE]bidirectional algorithm.

    Insert the word "character" before the comma.

        For example, the MIME standard ([RFC2045]) requires
        right-to-left character sequences in email to be ordered
        right-to-left in the byte stream. This conflicts with the
        [UNICODE] birectional algorithm, which expects Hebrew characters
        to be ordered left-to-right.

    I find this terminology so confusing that I don't know what it
    means.  I suggest "leftmost character first" and "rightmost
    character first".

http://www.w3.org/TR/WD-html40/struct/text.html

    Thank you for the white space section!  I've been wondering about
    how white space is treated in HTML for a long time.

        A line break occurring immediately following a start tag should
        be discarded, as should a line break occurring immediately
        before an end tag. This applies to all HTML elements without
        exceptions. In addition, for all elements except PRE, a sequence
        of contiguous white space characters such as spaces, horizontal
        tabs, form feeds and line breaks, should be replaced by a single
        word space.

    This is somewhat ambiguous.  If a start tag is immediately followed
    by a line break and then some white space, should all the white
    space be discarded with the line break?  Or should only the line
    break be discarded, and the remaining white space collapsed to a
    single word space?  My first guess based on the above paragraph was
    that only the line break gets discarded, but the examples suggest
    otherwise (which would be preferrable, I think).

    Regarding the datetime attribute:  Shouldn't HTML use the same
    date/time format as HTTP?

http://www.w3.org/TR/WD-html40/struct/tables.html

        A table must contain at least one row group. Each row group is
        divided into three sections: head, body, and foot. The head
        and foot sections are optional. The THEAD element defines the
        head, the TFOOT element defines the foot, and the TBODY element
        defines the body.

        When present, each THEAD, TFOOT, and TBODY instance must contain
        one or more rows (see TR).

        This example illustrates the order and structure of table heads,
        feet, and bodies.

    This wording makes it sound like a table can contain multiple
    distinct heads and feet, but from the DTD and examples it looks like
    it can contain at most one head and one foot, which possibly get
    replicated in the rendering.

http://www.w3.org/TR/WD-html40/struct/links.html

        String matching Characters with several possible representations
        in [ISO10646] (e.g., both precomposed and bsae+diacritic forms)
        match in two strings only if they have the same representation,
        except for case differences, in both strings.

    Typo:  "bsae" should be "base".

    I've read the whole section, and I don't understand the purpose of
    the distinction between the "source" and "destination" of a link.
    In all cases, one end is "here", and the other end is "there"; what
    use is the imaginary arrow?  Are there examples of link types that
    could be used with both rel and rev attributes?  If not, isn't that
    extra bit of information redundant?

http://www.w3.org/TR/WD-html40/struct/includes.html

        By setting the codetype attribute, a user agent can decide
        whether to retrieve the Java application based on its ability to
        do so.

    This needs to be reworded.  It seems to say that the user agent sets
    the codetype attribute, but that's not what was intended.

        To declare an rendering mechanism so that it is not executed
        when read by the user agent, set the boolean declare attribute
        in the OBJECT element.

    Typo:  "an rendering" should be "a rendering".

        It is only possible to define a server-side image map with the
        IMG element. To do so, set the boolean attribute ismap in the
        IMG definition. The associated map of regions must be specified
        with the usemap attribute.

    Usemap *must* be supplied?  Is that right?  Only the IMG element?
    The example uses an OBJECT with one client-side area and one
    server-side area.

http://www.w3.org/TR/WD-html40/present/frames.html

        Framesets may make navigation forward and backward through your
        user agent's history more difficult for users.

    Either change "your" to "the", or remove "for users".

        If we insert "table_of_contents.html" and "main.html" directly
        in the BODY, we solve the problem of associating the two
        documents, but we may cause user agents that support frames
        to retrieve the same data twice: one copy associated with the
        frameset and one copy inserted in the BODY.

    I don't understand this.  Earlier, it was said that the BODY
    following a FRAMESET was alternate content for user agents not
    supporting frames.

        Click <A href="main.html">here</A> for a non-frames version.

    HTML 4.0 appears to be doing an admirable job of being suitable
    for a wide range of browsers.  Surely the spec should not include
    "click here" in its examples, since that presupposes a particular
    interface.  (And it's bad hypertext style anyway.)

http://www.w3.org/TR/WD-html40/interact/forms.html

    One advantage of the GET method over the POST method for forms
    is that with the GET method you can make hyperlinks to the pages
    that result from the submission of particular form data.  People
    do this a lot; for example, they point to maps generated by remote
    sites based on street addresses in form fields.  Was this practice
    considered when the decision was made to deprecate the GET method?

        Attributes defined elsewhere 
        * id, class <<<<<<< forms.src (document-wide identifiers)

    There must be some sort of typo there.

        <LABEL for="email"email: </LABEL>

    A greater-than sign was left out.

        This attribute assigns an access key to an element. An access
        key is a single character from the user agent's current
        character encoding.

    The author doesn't know the user agent's character encoding.
    Also, this section seems to pretend that characters and keys are
    the same thing, which they certainly are not.  The mapping issue
    ought to be acknowledged, at least.  Perhaps there should be a
    recommendation that the value of accesskey be a single character
    having an "obvious" correspondence to a single "ordinary" key which
    is likely to appear on the keyboard of most users viewing the page.
    (What a mess.  I wonder if there's a cleaner way to do this?)

        We recommend that authors include the access key in label text
        or wherever the access key is to apply. User agents should
        render the value of an access key in such a way as to emphasize
        its role and to distinguish it from other characters (e.g., by
        underlining it).

    How can the user agent pick the access key out of the label text
    in order to render it differently?  Am I not understanding this
    paragraph?

http://www.w3.org/TR/WD-html40/interact/scripts.html

        onmouseover = script
            The onmouseover event occurs when the pointing device is
            moved over an element. This attribute may be used with most
            elements.
        onmousemove = script 
            The onmousemove event occurs when the pointing device is
            moved over an element. This attribute may be used with most
            elements.

    These two events have identical descriptions.

        1. All SCRIPT elements are evaluated in order as the document is
           loaded.
        2. All script constructs within a given SCRIPT element that
           generate SGML CDATA are evaluted. Their combined generated
           text is inserted in the document in place of the SCRIPT
           element.
        3. The generated CDATA is re-evaluated.

    This is unclear.  Does it mean that the CDATA generated by the
    first SCRIPT is evaluated after all the SCRIPTS are run?  Or is the
    CDATA generated by the first SCRIPT evaluated before anything after
    the first SCRIPT is looked at?  The latter is streamable, and is
    probably what should be done.

http://www.w3.org/TR/WD-html40/sgml/entities.html

        arkup-significant and internationalization characters (e.g., for
        bidirectional text).

    The `m' got lost.

    It would be extremely helpful if the Adobe standard glyph names were
    given along with the entity names and character names, especially
    for the characters included precisely because of their appearance in
    Adobe fonts.  It would also be helpful if the spec included an image
    containing typical renderings of all the printable characters.

http://www.w3.org/TR/WD-html40/index/elements.html

    It would be useful for the table to have one more column: Deprecated
    (Y/N).

http://www.w3.org/TR/WD-html40/index/attribs.html

    The Deprecated column would be useful for this table also.

http://www.w3.org/TR/WD-html40/appendix/changes.html

        The latest draft makes the align attribute attribute compatible
        with the latest versions of the most popular browsers. Some
        clarifications have been made to the role of the dir attribute
        attribute and recommended behavior when absolute and relative
        column widths are mixed.

    "attribute attribute" appears twice.

        A new set of attributes, including onchange-INPUT, in
        association with support for scripting languages, allows form
        providers to verify user-entered data.

    Is "onchange-INPUT" a notation, or a typo?

http://www.w3.org/TR/WD-html40/appendix/notes.html

        This can be altered by setting the width-TABLE attribute of the
        TABLE element.

    That same notation again.  It definitely looks fishy here.

        For each column, let d be the difference between maximum and
        minimum width of that column. Now set the column's width to the
        minimum width plus d times W over D. This makes columns with
        large differences between minimum and maximum widths wider than
        columns with smaller differences.

    The second statement is inaccurate.  An accurate statement would be
    "More extra space is allocated to columns with larger differences".

        The values for theframe attribute have been chosen to avoid
        clashes with the rules, align and valign-COLGROUP attributes.

    There's that notation again.

Received on Tuesday, 29 July 1997 21:57:02 UTC