Re: HTML 4.0 comments

Adam M. Costello wrote:
> 
>     Many of the comments point out typos.  Some point out confusing,
>     misleading, or imprecise parts of the spec, and suggest
>     clarifications or additions (unless I was baffled).
> 
>     Sorry I didn't look at the spec when it was still a Proposed
>     Recommendation, but the semester just ended.

Adam,

Thank you for your careful reading of the spec - where were you during
the final editing process!

I don't believe the W3C process will allow us to release a revision
of the HTML 4.0 Recommendation to correct typos and small errors.
Therefore, we won't be able to take your comments into account
unless we prepare a more substantial revision of the language.
Until then, we will be maintaining the errata sheet, which
I have updated based on your comments.

I'm forwarding your more substantial remarks (and my comments)
to the Working Group for perusal.

Thanks again for your good work!

 -- Ian


> 8.2.4 Overriding the bidirectional algorithm: the BDO element
> 
>         One reason for this may be that the MIME standard ([RFC2045],
>         [RFC1556]) favors visual order, i.e., that right-to-left
>         character sequences are inserted right-to-left in the byte
>         stream.
> 
>     I don't think this means what was intended.  My best-effort
>     interpretation of "right-to-left character sequences are inserted
>     right-to-left in the byte stream" is that the rightmost character
>     appears first in the byte stream.  But that is the opposite of RFC
>     1556 visual directionality, which requires the leftmost character
>     to appear first in the byte stream.  I strongly recommend using
>     phrases like "leftmost character first" and avoiding phrases like
>     "right-to-left in the byte stream", because byte streams do not have
>     a left and right, only an earlier and later.

I will look into this issue.

> 8.2.5 Character references for directionality and joining control
> 
>         Mirrored character glyphs. In general, the bidirectional
>         algorithm does not mirror character glyphs but leaves them
>         unaffected. An exception are characters such as parentheses (see
>         [UNICODE], table 4-7).
> 
>     Although the Unicode character names and example glyphs are
>     available online, the text of the spec is not, so I wish the HTML
>     spec would elaborate a bit on the mirroring of parentheses.  If
>     characters the characters ( and ) were called "open parenthesis" and
>     "close parenthesis", I could understand why their appearance would
>     depend on the directionality of the text.  But they're called "left
>     parenthesis" and "right parenthesis", so I don't see why they would
>     ever be mirrored.  In right-to-left text, you would obviously begin
>     a parenthetical with a right parenthesis, and end it with a left
>     parenthesis, correct?

I will also look into this.

> 9.1 White space
> 
>         authors should not rely on user agents to render white space
>         immediately after a start tag or immediately before an end tag.
> 
>     What about the converse?  Should authors also not rely on user
>     agents *not* to render whitespace immediately after a start tag?
>     For example, may authors assume that these will be rendered the
>     same:
> 
>     <li>foo
>     <li> foo
> 
>     or should authors always use the first form?

It is my opinion that the sentence in the spec subsumes the converse.

> 
> 9.3.4 Preformatted text: The PRE element
> 
>         width = number [CN]
>             This attribute provides a hint to visual user agents about
>             the desired width of the formatted block.
> 
>     By definition, preformatted text already has a width, which can be
>     determined by scanning it and noticing the length of the longest
>     line.  Maybe you mean that this attribute provides a hint, not about
>     the width of the text, but about the width of the window for which
>     the text was formatted.

I believe the hint is for user agents so that they may reserve
space before formatting.

> 
>         When handling preformatted text, visual user agents:
>             May leave white space intact.
>             May render text with a fixed-pitch font.
>             May disable automatic word wrap.
> 
>     Shouldn't each "may" be "should"?  Authors usually depend on these
>     for vertical alignment.

Since these are formatting issues, and user agents vary significantly,
we adopted less restrictive wording.

> 11.4.2 Categorizing cells
> 
>         In order to determine, for example, the costs of meals on 25
>         August, the user agent must know which table cells refer to
>         "Meals" (all of them)
>         ^^^^^^^^^^^^^^^^^^^^^
> 
>     No, only cells in the Meals column refer to meals.  Maybe you meant
>     "which table cells refer to "Expenses" (specifically, Meals)".

No, I believe "Meals" (all of them) is correct, although perhaps
somewhat
vague.

> 13.3.4 Object declarations and instantiations
> 
>         <P><OBJECT declare id="tribune" ...
>         <PARAM name="font" valuetype="object" value="#tribune">
> 
>     Is the pound sign supposed to be there?  Section 13.3.2 said:
> 
>         object: The value specified by value is an identifier that
>         refers to an OBJECT declaration in the same document. The
>         identifier must be the value of the id attribute set for the
>         declared OBJECT element.
> 
>     That suggests to me that the PARAM element should have
>     value="tribune", with no pound sign.

I believe you're correct, but I have not yet added this to the
errata sheet.

> 13.6.1 Client-side image maps: the MAP and AREA elements
> 
>         usemap = uri [CT]
>             This attribute associates an image map with an element. The
>             image map is defined by a MAP element. The value of usemap
>             must match the value of the name attribute of the associated
>             MAP element.
> 
>     Since the value of the usemap attribute is a URI, it should be
>     permissible to refer to a MAP element from another document.  None
>     of the examples do this.  Is it allowed?

If the specification doesn't explicitly forbid it, it is possible. I
believe
the Working Group chose to underplay this feature of the language.

> 17.3 The FORM element
> 
>         The value is a space- and/or comma-delimited list of charset
>         values.
> 
>         This attribute specifies a comma-separated list of content types
> 
>     Throughout the spec, some attribute values are space-separated, some
>     are comma-separated, and some are space- and/or comma-separated.  Is
>     there a simple rule that one can memorize, rather than consulting
>     the spec every time?  If so, this rule should be stated somewhere.

Unfortunately, there is no simple rule. For historical reasons, some
lists are space-separated while others are comma-separated.

> 17.11.2 Access keys
> 
>         accesskey = character [CN]
> 
>     How is this case neutral?  Doesn't it have to be either case
>     sensitive or case insensitive?  Am I allowed to have one control
>     with an accesskey of "C" and another with an access key of "c"?  (I
>     vote no.)

A "character" refers to a single character of the document character
set. 
Since "c" and "C" differ in the document character set, there is no
case involved.

>     By the way, shouldn't the spec say that no two controls in the same
>     document should have the same accesskey?

It probably should.

> On forms:
>     The examples use 'Content-Disposition: attachment' in the subparts,
>     rather than 'Content-Disposition: file'.  Are both correct?  Is one
>     preferred?

This material has been taken from RFC 1867. If the RFC doesn't specify
this, the HTML 4.0 Recommendation won't either.

> 
> 24.4 Character entity references for markup-significant and
>      internationalization characters
> 
>         Entities have also been added for the remaining characters
>         occurring in CP-1252 which do not occur in the HTMLlat1 or
>         HTMLsymbol entity sets. These all occur in the 128 to 159 range
>         within the cp-1252 charset.
> 
>     What is CP-1252?  It doesn't seem to be defined or referenced
>     anywhere.  Also, either capitalize the second occurrence or
>     decapitalize the first.

Good question. I hadn't noticed this before.

Received on Sunday, 28 December 1997 17:13:29 UTC