Re: DTD Modularity [Was: Glue RFC]

>In message <v02110102ac7c982aa839@[]>, Murray Altheim writes:
>>>In message <>, lilley writes:
>>>>Dan Connolly <> said:
>I've done much cogitating on the idea of specifying HTML as a straight
>context free grammar in s-expressions:   [further description]


I'm aware of your thoughts on this going back quite a way. In fact, I was
thinking of regurgitating one of your earlier messages (from last January!)
to see where we ALL have ended up as regards those ideas. I still agree
with what you had written, and I think it is still valuable and warrants
the space here. With the current dialogue on incremental and fragmentary
DTDs, it is even more so.


---------------------- complete message follows -------------------------
Message 15/63  From Daniel W. Connolly              Jan 26, 95 07:20:31 pm +0100

Date: Thu, 26 Jan 1995 19:20:31 +0100
Precedence: bulk
To: Multiple recipients of list <>
Subject: Evolution of HTML and other specs [Was: Browser support of HTML 2.0 ]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: To sign off, send mail to with body DEL WWW-HTM

In message <>, Brandon Plewe writes:
>While I salivate for HTML 3, and watch everyone clamor for this or that to
>be supported, I am reminded that currently, most browsers don't even support
>all of HTML 2 as it was designed (or as the spec says it should be handled).

The key there is "should," and not "must."

Your suggestions are perfectly reasonable "enhancement requests" of
current browsers, but they can't reasonably be called "defect

>I don't mean to sound argumentative, or negative about the current browsers.
>I think they're wonderful--I'm just wondering if there are plans for these
>"already existing" features; and if not, why are they in the RFC?

Already existing in what way?

I agree that "suggested rendering" is a tricky part of the RFC.

I'm a minimalist. My original proposal for an HTML specification was
to specify _only_ the syntax; i.e., enough information to decide
whether a sequence of characters constitutes a valid HTML document or
not, and if it does, enough info to make some sort of parse tree out
of it. (It was about 10 pages long, way back in Jan 1993.) So the
first step would be to get all the various implementations to agree
how to parse comments and attribute values, for example.

The question of what you _do_ with the resulting parse tree would
be outside the scope of that specification.

But folks weren't happy with that. They wanted something that says
"The H1 tag should be in a bigger font and have space above and
below." Blech. Now you're talking about a browser specification, not a
specification of the HTML language. And until you have some formalism
like DSSSL, it's very difficult to specify this sort of behaviour with
any level of rigor.

And they wanted a specification of ISMAP behaviour, and how you encode
forms data in application/x-www-urlencoded format, and ...

In the future, I'm going to push very hard to split the existing HTML
specification into a set of smaller, more concise RFCs:

        "HTML Syntax"
                -- how to decide whether a sequence of characters is
                a valid HTML document, and if so, how to create
                a parse tree.

        "Interpretation of HTML Idioms"
                -- an informal description of the meaning and suggested
                rendering of an HTML parse tree.

        "The text/html Internet Media Type"
                -- registration of HTML as a MIME type. Charset issues.
                Newline Issues. Appendices specifically addressing
                SMTP transport and HTTP transport issues. Security

        "World-Wide Web User Agents"
                Specific techniques: basic HREF links, ISINDEX, FORMS, ISMAP,
                .mailcap, $WWW_HOME, mailto:, proxies, security issues.
                Suggestions for documentation, default configuration, etc.

        "World-Wide Web Hypermedia Architecture"
                -- formal discussion of the WWW hypertext model: documents,
                anchors, links, searching.
                Formal discussion of common abstractions from ftp, http,
                gopher, WAIS, etc. Definition of correct caching/proxy

(All these in addition to the URL and HTTP RFCs. The Common Gateway
Interface (CGI) needs an official maintainer somewhere too.)

The job of revising the HTML 2.0 document to accomodate the proposed
HTML 3.0 features looks completely overwhelming. But as we revise the
2.0 spec, I suggest we split it up as above. The job of revising
any of the above document w.r.t. HTML 3.0 is a manageable task:

Here are the outstanding/upcoming issues I see in each area:

        "HTML Syntax"
                -- DTD changes for new elements.
                ISO character entities, and how they show up in
                the parse tree.
                Perhaps we allow <!entity > declarations,
                marked sections, and a few other SGML syntactic idioms.
                RAST representation of parse tree for conformance

        "Interpretation of HTML Idioms"
                Table rendering. Figures. Super/subscript. DSSSL-Lite.
                Toolbars (next/previous/up). Vendor- and
                application-specific extensions.

        "The text/html Internet Media Type"
                Character sets, versions, levels, format negociation
                issues. Vendor- and applicatoin-specific extensions.

        "World-Wide Web User Agents"
                File upload. Embedded presentation. Mandatory display
                of copyrights. Display of security information.
                Desktop message bus (CCI/OLE/Tooltalk/AppleEvents).
                Distributed editing, annotation,
                and other forms of collaboration. Perhaps advances
                in resource discovery technology (e.g. harvest,
                verity) will have user interface implications.

        "World-Wide Web Hypermedia Architecture"
                link relationships, embedding, compound
                doucment architecture. the web as a knowledge base.
                isomorphisms with HyTime. Publishing model (URNs/URCs,
                copyright, payment, replication, authentication,
                access control).

                Security. Variations on Proxy: no-cache.
                Session management, and application-level
                packets. Transactions.
                Desktop message-bus, UDP version of the protocol.

Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   (512) 834-9962 x5010

Received on Wednesday, 13 September 1995 13:02:38 UTC