Re: Goals of HTML (and XML)? (was Re: Foreign Words and Phrases..)

Jukka Korpela (
Thu, 25 Sep 1997 14:34:43 +0300 (EET DST)

Date: Thu, 25 Sep 1997 14:34:43 +0300 (EET DST)
From: Jukka Korpela <>
In-Reply-To: <>
Message-ID: <>
Subject: Re: Goals of HTML (and XML)? (was Re: Foreign Words and Phrases..)

On Thu, 25 Sep 1997, Markku Savela wrote:

> This discussion seems to be shooting into directions where HTML is not
> intended to go, when it starts to talk about elements with application
> specific semantic (such as taxonomic names etc.).

Perhaps I'm the only one who thinks that HTML _should_ go into such
directions if it is going to be useful in its basic purpose (as a
device-independent hypertext language for the World Wide Web).
Please let me know if this is so (and I take silence as affirmative),
and I promise to keep silent about it in the future.

In a sense, things like taxonomic names, mathematical expressions,
bibliographic references, and "journalistic" constructs (abstracts,
summaries, headlines etc) are "application-specific". And I'll refrain
from asking how far one should go, since scripts, forms, and even
images and tables and multilingualism might look application-specific to
some people.

Suppose I wish to compose a document which accidentally mentions an animal
using its scientific name, includes a mathematical formula, contains
a bibliography, and provides a summary. Should I use an "application-
specific" notational language for each of them? How am I expected to
combine them together and with the HTML markup which I use?

> Such things are best
> left to other tagging systems (for example, XML based) or already
> existing SGML applications (TEI etc).

In my opinion, the best approach is to introduce HTML markup for the
most common structures needed in documents plus well-defined mechanisms
which one could use for special needs when the basic tools are not
sufficient. For instance, I'd like to be able to use simple mathematical
markup when I casully need to speak some math. Real mathematicians
would undoubtedly need more powerful facilities - but I think they
should be useable within HTML documents in some special math mode.
Perhaps somehow via the OBJECT element, so that mathematically
challenged browsers wouldn't get mad when they see higher math.

> HTML should stick to "logical presentation elements", for example <h1>
> is just heading, <p> is paragraph, <ul> is just list. None of these
> elements attempt to define what the information content is.

(Well, a _heading_ shouldn't really be a logical presentation element
by itself, only an element used in the context of an element used
for dividing the document into sections.)

I regard myself as extreme Structuralist, or Purist, in HTML matters,
but I do not regard semantics as so different from formal structures
like section, paragraph, or list. For instance, is emphasis structural
or not? I think that if I have a sequence of paragraphs, one of which
is for some reason more important than others, this is a structural
feature just as the division into paragraphs is. (It might even be
_more_ structural.) Now, as the discussion here has clearly shown
(to me at least), emphasis is a many-faced thing, or rather just
a name which hides the true diversity; so that EM and STRONG are
really just the nobility's way of uttering I and B. In practice, it
implies that browsers must use some single presentational method to cover
a wide variety of meanings.

> In this light, <address> is an example of an element that should not
> exist in HTML.

In that light, definitely. _My_ objection to <address> is that it is
far too simplistic and unstructured. Moreover, address information
should be basically _metainformation_.

> Reading HTML 4.0 drafts, I can see that it mostly follows the above
> idea.

Perhaps in the sense that HTML 4.0 contains some logical markup, too.
Not much more than HTML 3.2, actually. Or HTML 2.0 or the earliest
HTML drafts for that matter. :-) 

> The introduction of the styles confuses the issue. Why need all the
> elements when almost *everything* could be done with just few
> elements, for example, <span> and <div>.

Just one. Combine <span> and <div> into one element. Division into
paragraphs is handled using style sheets anyway. :-) Call that
element <html>, allowing it to be nestable, and, voila, we have
really simple language.

> "<span class=taxon> ... </span>" is very
> close to "<taxon> ... </taxon>".

Too close but not sufficiently close. People would (or will) use
class as if they were really able to introduce new (sub)elements.
The classified tower of Babel will be erected, every author
writing his own HTML language.