Re: www-html-d Digest V96 #250

Thomas Breuel (tmb@best.com)
Sat, 17 Aug 1996 16:37:54 -0700


Date: Sat, 17 Aug 1996 16:37:54 -0700
From: Thomas Breuel <tmb@best.com>
Message-Id: <199608172337.QAA23617@shellx.best.com>
To: www-html@w3.org
Subject: Re:  www-html-d Digest V96 #250

|Why is it, do you think, that HTML is -- from my vantage at least --
|so perenially susceptible to this sort of subjective interpretation,
|even amongst those most expert in its history and application?
|Is it because of HTML's often rather non-prescriptive nature? 
|Because of the confusingly different manners in which various
|browsers choose to render it?

I think it's largely because of the SGML heritage.  SGML had some nifty
ideas for structural markup, but their realization is less than
perfect.  One big problem is that SGML markup isn't all that convenient
for manual authoring; to counteract that, all sorts of rules for
letting you abbreviate constructs were introduced, but those make
automatic processing and error detection/recovery harder and cause
confusion like "do you need a </P> or not".  SGML tries to be a kind of
programming language, and sadly it seems to be lacking many of the
syntactic features that make programming languages easy to use and
robust.

Furthermore, HTML isn't SGML.  SGML was designed to let people design
structural models to address the needs of modeling specific kinds of
content that occurs over and over in some organization.  HTML was one
such model.  But a single document model can't serve the needs of all
Web publishers.  On the other hand, providing the full generality of
SGML is not a solution, either, in my opinion, since having
user-defined structures with only a syntactic framework is almost as
useless for Web purposes as no standard at all (and that kind of
generality was not the intent of the original SGML work anyway).

My long range concern is that if the HTML evolution gets too much
out of touch with what content providers want, HTML may simply become
irrelevant.  More and more content is being made available as images,
Shockwave, and Microsoft Office documents, mainly because authors
like the control over layout and presentation they get and because
they don't like mucking around with HTML tags.  And it's not clear
to me that proposals like CSS address these issues well enough.

|Any thoughts on the matter would be much appreciated.  Anything to 
|help me get a better grip on this slippery beast called HTML :)

SGML documentation is quite relevant; there are some good
books on it out there.

Here is where I hope HTML will be heading in the future.  Of course,
some of those ideas are already under development/consideration in some
guise.  Others depend on how facilities that are being developed will
end up being used; for example, allowing style sheets and DTDs may turn
out to be a blessing or a curse, depending on whether communities will
get together and define a few, common standards or whether everybody
just uses them to get the "look" and syntax they want.

 -- There should probably be several distinct HTML subtypes for
    different classes of content, like memos, navigational pages,
    product descriptions, collections of document summaries, search
    results, etc.  Of course, the point is not to introduce deliberate
    incompatibilities between different document types, but to provide
    authors with frameworks and facilities for representing the
    specific types content they have as conveniently and standard as
    possible.

 -- Some HTML subtypes should provide very explicit control over
    layout and presentation (e.g., navigational pages), while 
    others should be primarily structural (e.g., memos, search
    results, etc.) to allow automatic processing.

 -- There should be standard tags supporting document indexing
    an identifying large scale document structure (in fact,
    this is kind of what "core-HTML" and META are trying to do).
    Indexing information needs to be document type specific, however.

 -- Many of the SGML conventions for "simplifying markup" should
    be dropped and a simple, canonical syntax should be adopted
    and enforced.

 -- There should perhaps be mechanisms that discourage the definition
    and use of style sheets and DTDs by individuals and encourage
    the creation of a few common standards.  (Complexity is on such 
    mechanism--how about DSSL? :-)

Cheers,
Thomas.