Re: ACTION re: HTML 3: Too many tags!

Joe English (joe@trystero.art.com)
Thu, 27 Jul 1995 14:13:46 PDT


Message-Id: <9507272113.AA20702@trystero.art.com>
To: www-html@w3.org, html-wg@oclc.org
Subject: Re: ACTION re: HTML 3: Too many tags!
In-Reply-To: <v02110101ac3d96e79dba@[192.188.119.193]>
Date: Thu, 27 Jul 1995 14:13:46 PDT
From: Joe English <joe@trystero.art.com>



murray.altheim@nttc.edu (Murray Altheim) wrote:

[ Excellent rationale for the continued existence of
  formatting elements. ]

> If I'm taking your statement correctly, Joe, then this new element would be
> something akin to a generic physical style element, with the attribute
> containing the physical style information.

Attributes plural, but yes.

> I would turn this entirely on its head, given the current discussion. If
> most source text comes from legacy documents, then physical markup can
> continue to be created by conversion routines.
>
> I would prefer instead that ALL logical/semantic/informational (depending
> on your language) markup be a single element, with attributes providing the
> semantic information (eg., VAR,DFN,EM,STRONG,ABBREV, etc.).

Instead, or in addition to?  If you mean in addition to, that
also sounds reasonable.  One general-purpose element for logical
markup (with the semantic role identified by CLASS), and one
general-purpose element for formatting markup (with any number
of attributes for style properties) would be sufficient.

I think <EM> would be a good choice for the general-purpose
semantic element, since STRONG, CODE, SAMP, KBD, VAR, and CITE
are *all* just different forms of emphasis.  Or perhaps a new
element called <EMPH> or <HP> (for "Highlighted Phrase") would
be better.

<FONT> may not the best name for the formatting element
(it could also be used to specify, for example, foreground
and background colors), but it has a precedent in Netscape.
<C>, <CLF>, <FORMAT>, and a bunch of others have been proposed
as well [1].

I like the idea of a single general-purpose formatting
element more than multiple special-purpose ones for
several reasons, one of which is that using a single
element preserves more of the original structure:

    <color fgcolor=red bgcolor=blue><font size=large><b>
    blah</b></font></color>

says to the parser that "blah" is a chunk of bold text
inside a chunk of differently-sized text inside a
chunk of colored text, whereas

    <font fgcolor=red bgcolor=blue fontsize=large fontstyle=bold>
    blah</font>

identifies it as a single element, which is probably more
appropriate.

There's no reason why the <EM> (or <HP> or <EMPH>) element
shouldn't include the same formatting attributes; however, I
don't think the two elements should be folded together
entirely.  The first would be used to indicate that the text has some
known significance, whereas the second should be used where
it does not.  This is a useful distinction for search engines:

    <font fontsize=large>W</font>elcome to my homepage

should index the word "Welcome", but

    <em fontsize=large>W</font>elcome to my homepage

should index "W" and "elcome" separately, for example.


Lastly, I don't think that the existing phrase elements should
be removed; there is too much existing legacy for that, and
some have definite utility.  (I would be reluctant to give up
<SAMP> and <VAR>, for example, since they are so useful in
technical documentation; and as Murray points out <B> and <I>
are valuable for converting existing word-processed documents,
where boldness and italicness are often the only structural
information available.)  But a single *general-purpose*
formatting element is also needed.



--Joe English

  joe@art.com

[1] <URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q2.messages/1472.html>
    "TEXT, C, NOBR, and WORD tags",
    <URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q2.messages/1430.html>
    "Suggestion for HTML Text Format Elements",
    and other related threads.