W3C home > Mailing lists > Public > www-html@w3.org > September 1999

Re: FW: I-D ACTION:draft-connolly-text-html-00.txt

From: <Jukka.Korpela@hut.fi>
Date: Fri, 24 Sep 1999 05:45:13 -0400 (EDT)
To: www-html@w3.org
Message-ID: <Pine.OSF.4.10.9909241151120.19467-100000@beta.hut.fi>
On Thu, 23 Sep 1999, Larry Masinter wrote:

>   ftp://ftp.parc.xerox.com/pub/masinter/draft-connolly-text-html-02.txt
- -
> It is intended to obsolete the previous IETF
> documents defining HTML, including RFC 1866, RFC 1867, RFC 1980,
> RFC 1942 and RFC 2070.

In principle, it would be a good idea to clarify the situation
by codification so that there would be just one HTML specification,
or a well-defined set of HTML specifications.

But simply obsoleting those RFCs would be going backwards, unless
essential information from them is carefully incorporated into
the HTML specification by the W3C. There is a large number of issues
where those RFCs address points which are not addressed or have been
formulated vaguely in HTML 4.0 (or HTML 4.01).

To take a few examples about particular issues I'm familiar
(and partly frustrated) with:

- RFC 1866 contains definitions for things which are left undefined
  in HTML 4.0. At present we can say that when there are semantic
  gaps in HTML 4.0 we can often mentally fill them with RFC 1866.
  Would that be appropriate if RFC 1866 were obsoleted? For example,
  - RFC 1866 makes the useful requirement that EM and STRONG must
    be rendered as distinctly from each other and from normal text.
  - RFC 1866 makes some vague notes on nested text-level elements,
    whereas HTML 3.2 suggests that "user agents should do their best to
    respect nested emphasis" - and HTML 4.0 ignores the whole issue,
    unless I'm missing something;
    this is _not_ just a presentational question - the basic problem
    is whether e.g. in <strong>...<em>...</em>...</strong>
    the text inside the EM element is just emphasized as compared with
    normal text or _relatively_ emphasized with respect to the content
    of its parent element
  - RFC 1866 describes, under "6. Characters, Words, and Paragraphs",
    TEXTAREA as preformatted text, which corresponds to actual
    implementations; the corresponding discussion in HTML 4.0 spec
    explicitly says that PRE is the only exception to the collapse
    of white space

- RFC 2070 makes, with some handwaving, serious attempts at
  formulating requirements and recommendations on having
  character encoding ("charset") information handled properly;
  as an example, it makes the following important point (in 1.2.2):
      To ensure interoperability and proper support for at least ISO-
      8859-1 in an environment where character encoding schemes other
      than ISO-8859-1 are present, user agents MUST correctly interpret
      the charset parameter accompanying an HTML document received from
      the network.
  But HTML 4.0 does not seem to make any such requirement, or even
  a suggestion. In fact, it explicitly says: "This specification does not
  mandate which character encodings a user agent must support." 
  ( http://www.w3.org/TR/REC-html40/charset.html#h-5.2 )   
  (Note: Not even US-Ascii is required to be supported!)

- in menus created with a set of radio buttons or with a select element,
  there is great confusion between different specs (incl. RFC 1866),
  and HTML 4.0 doesn't clear it up - au contraire, it increases
  the vagueness; see http://www.hut.fi/u/jkorpela/forms/choices.html#app

- for file input, RFC 1867 is the only extensive description, and
  HTML 4.0 makes references to it a vague manner - something between
  informative and normative it seems; anyway, it is obvious from the note
  at http://www.w3.org/TR/REC-html40/interact/forms.html#h-
  that the HTML 4.0 specification was not written to be a standalone
  description of all aspects of file input;
  nasty question of the day: what does the maxlength attribute mean
  in an <input type="file">
  a) by RFC 1867
  b) by HTML 4.0
  c) in current implementations?

I suppose this contains more examples than would really be needed.

Wouldn't it be better to specify, as an interim solution, to make
it possible to redefine the text/html media type in a reasonable time,
that the HTML-related RFCs are _not_ obsoleted? On the contrary,
they should be listed as sources of additional information, to be
used in issues where they relate to the meanings of constructs
defined in HTML 4.0(1).

Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Friday, 24 September 1999 06:01:05 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:51 UTC