Re: Doctypes, Declarations, and HTML Versions

On Tue, 12 Oct 1999, Russell Steven Shawn O'Connor wrote:
> On Tue, 12 Oct 1999, Frank Boumphrey wrote:
> 
> > > The difficultly of making an SGML parser is probably overblown.
> > 
> > having written parsers for both i can assure you that it is not!!

I agree, but only up to a point.  A "non-validating SGML subset" parser is
not overly difficult.  It has been done, more than once.  The problems
with them are two-fold: none of them meet conformance requirements, and
their coverage varies.  Even there, OMITTAG by itself is not the hydra
headed monster it's made out to be - the real problem is ISO 8879's goofy
approach to tag inference (so that writing a conforming/validating parser
becomes a distinctly *different* exercise!) 

OTOH, the optionality of declarative information *is* overblown.  A parser
for DOCTYPE, ELEMENT, NOTATION, ATTLIST and ENTITY declarations (all that
one really needs in 90%+ applications) is straightforward - especially if
you break SGML-compliance and treat PEs as just text macros.  Separating
declaration and use is a powerful representation technique: it's the
theory behind macros, and for that matter, what database wonks call
normalization.  To argue that we don't *need* declarations - all in the
name of avoiding an "ugly syntax" by reputation - is silly.

> No doubt writing a XML is much much much easier than writing an SGML
> paser.  But still.  How many lines does it take to incoperate nsgmls into
> your code?

Wrong candidate: nsgmls is a validating parser.  Its error recovery
heuristics leave a lot to be desired.  Also, it's sensitive to "errors by
fiat" - errors only because ISO 8879 says so, which any non-validating
parser is not only likely to tolerate but will tolerate precisely in
order to simplify the implementation.

Until the XML initiative came along, there was no effort to define (or
even organize) the various "rational subsets" of SGML floating around.
However, XML isn't the only answer.


Arjun

Received on Tuesday, 12 October 1999 18:45:01 UTC