Lexical details of HTML [was: DIV/CLASS ]

Daniel W. Connolly (connolly@beach.w3.org)
Thu, 16 May 1996 12:15:56 -0400


Message-Id: <m0uK5iW-0002UZC@beach.w3.org>
To: Lee Daniel Crocker <lee@piclab.com>
Cc: www-html@w3.org
Subject: Lexical details of HTML [was: DIV/CLASS ]
In-Reply-To: Your message of "Wed, 15 May 1996 14:21:47 PDT."
             <199605152121.OAA01998@web1.calweb.com> 
Date: Thu, 16 May 1996 12:15:56 -0400
From: "Daniel W. Connolly" <connolly@beach.w3.org>

In message <199605152121.OAA01998@web1.calweb.com>, "Lee Daniel Crocker" writes
:
>
>SHORTTAG implies those?  Damn.  I guess that means we have to keep
>it, because it's too late to back away from <DL COMPACT> now.  I
>only wanted <B/Bold/ and <> and <!> expressly forbidden.
><P CENTER> obviously should be as well.  I appears, then, that
>the SGML DTD is inadequate for validation, and validators have
>to have a lot of application conventions built in to forbid things
>that nobody supports but that are SGML legal.

Interestingly enough, it seems that SP (http://www.jclark.com/sp.html)
has support to warn about several of the lexical idioms of SGML that
aren't supported in HTML, like <b/bold/ etc. (see earlier message
about -wmintag)

>  That's unfortunate.
>While we'd all love to see browsers become SGML-based, that isn't
>going to happen; not now, not in the future.

It has already happened. Viola used to use sgmls in its
implementation.  The HTML parser in grail subclasses from an SGML
parser (not a validating parser, and probably not a conforming parser
but...). And panorama, and the stonehand/spyglass stuff, and ...

What's your point?

>We need to specify HTML down to every byte without pointing to
>SGML if we expect a useful standard.  Of course the face that it
>is in fact a subset of SGML will always be useful.

Toward that end, please see:

	A Lexical Analyzer for HTML and Basic SGML 
	W3C Working Draft
	Dan Connolly connolly@w3.org
	http://www.w3.org/pub/WWW/TR/WD-sgml-lex
	$Date: 1996/02/08 16:27:45 $


>While we're at it, can we solidify the hopelessly ambiguous and
>ill-specified comment syntax?

The comment syntax hasn't changed since 1986, when SGML was published.
We're just waiting for implementations to catch up :-)

It's documented in the SGML standard, and again in RFC1866, and
again in the above tech report.


Dan