Re: <insert> and external entity references

Murray Altheim (murray@spyglass.com)
Wed, 20 Mar 1996 01:03:17 -0400


Message-Id: <v02110101ad752fda994b@[140.186.34.50]>
Date: Wed, 20 Mar 1996 01:03:17 -0400
To: Abigail <abigail@tungsten.gn.iaf.nl>
From: murray@spyglass.com (Murray Altheim)
Subject: Re: <insert> and external entity references
Cc: www-html@w3.org

>Murray Altheim wrote:
>++ This would really be the big change: not using HTML as the base language of
>++ the Web. We'd use SGML (MIME type "text/sgml; level=1|2|3|4"), allowing the
>++ DOCTYPE of the document to determine the DTD, just as in SGML. That DOCTYPE
>++ could simply specify a dialect of HTML for the current majority of web
>++ documents.
>
>I have heard this many times, yet I see problems noone has given
>me an answer to. HTML certainly is more than just a grammer.
>Search engines can index a document properly _because_ there
>is an implicit meaning to <TITLE>, that <H1> is more important
>than <H6>, that <STRONG> is used for something else than <B>, etc.

You may be making a sizeable assumption about the intelligence of search
engines. Beyond TITLE and HEAD information, body content is pretty much
indexed as full text. OpenText (as also Yahoo) advertises this as a
feature. Few engines are geared for keyword in META, etc. and I seriously
doubt that there's much index differentiation between content found in
formal META elements and in body content. But I don't disagree with the
point of element utility.

>But in the DTD, <H1> and <H6> have interchangeable roles;
><STRONG> and <B> have the same context and the same content;
><TITLE> is just something which appears in the <HEAD>.
>
><A>, <IMG>, <INPUT> have side effects which aren't set in the DTD.

As you correctly note, the full specification of HTML (or any SGML
application) as a language or of a conforming user agent goes beyond what
is contained in a DTD. The HTML DTD is simply the formal definition of the
HTML syntax.

An SGML application is not specified only by a DTD. Combine the formal
specification of the abstract syntax, character set, quantities, etc. found
in the SGML declaration, the application-specific syntax of the language
specification (DTD), the specified application conventions ("H1 is bigger
than H6"), and the element formatting information (whether hardwired into a
browser or handled externally via a stylesheet) and you have an SGML
application. The application conventions for the core of HTML are widely
known and deployed.

>If each document comes with its own DTD, then what? A user agent
>knows how to parse it, but how should it be displayed? Of course,
>authors could be required to deliver a style sheet as well, but
>they have to include everything, as there cannot be user agent
>defaults to fall back on. And what about user preferences? How
>is a user supposed to set preferences, if each document can have
>unknown elements?

I fail to see where this is really a problem. If a document author wished
to specify a presentation style, they'd simply specify it in a stylesheet.
If the element was designed for markup that didn't need presentational
differentiation from body text, no style specification would be needed. We
are currently limited by the "hardwired stylesheets" of popular browsers.
Had a simple default stylesheet been the chosen method for specifying
element presentation in the first version of Mosaic (rather than hardwiring
it in code), things might look very different today.

Expanding upon what is already conventional within HTML using a stylesheet
is quite simple. For example, there is no AUTHOR element in HTML 2.0. If a
browser came upon

    "Gee, Frank, that <AUTHOR>Bill Gates</AUTHOR> sure is a swell writer!"

the user agent/browser would simply ignore the AUTHOR tags. If the document
author wished to somehow differentiate the element, style information could
be included, such as (using a stylesheet syntax):

    AUTHOR
    {
        font-style: italic
        color : purple
    }

Concerns over TITLE, ISINDEX, etc. are met with the application conventions
of whatever SGML application we're dealing with. I'm not of the mind that
there will be a dozen different SGML applications floating out on the web,
unrelated to HTML. There will probably be very few, with a multitude of
variants upon a central referent. That referent application will be some
evolutionary descendent of current day HTML, and will inherit many of the
application conventions of HTML.

If a particular community breaks off and starts using an entirely different
SGML application unrelated to HTML (say the chemical industry), they will
probably be using browsers designed to deal with the application
conventions and needs of that community, possibly using a combination of
custom applications, generic application with stylesheets and/or plug-ins.

Murray

```````````````````````````````````````````````````````````````````````````````
     Murray Altheim, Program Manager
     Spyglass, Inc., Cambridge, Massachusetts
     email: <mailto:murray@spyglass.com>
     http:  <http://www.stonehand.com/murray/murray.htm>