- From: Arjun Ray <aray@q2.net>
- Date: Tue, 12 Oct 1999 18:45:05 -0400 (EDT)
- To: W3C HTML <www-html@w3.org>
On Tue, 12 Oct 1999, Russell Steven Shawn O'Connor wrote: > On Tue, 12 Oct 1999, Frank Boumphrey wrote: > > > > The difficultly of making an SGML parser is probably overblown. > > > > having written parsers for both i can assure you that it is not!! I agree, but only up to a point. A "non-validating SGML subset" parser is not overly difficult. It has been done, more than once. The problems with them are two-fold: none of them meet conformance requirements, and their coverage varies. Even there, OMITTAG by itself is not the hydra headed monster it's made out to be - the real problem is ISO 8879's goofy approach to tag inference (so that writing a conforming/validating parser becomes a distinctly *different* exercise!) OTOH, the optionality of declarative information *is* overblown. A parser for DOCTYPE, ELEMENT, NOTATION, ATTLIST and ENTITY declarations (all that one really needs in 90%+ applications) is straightforward - especially if you break SGML-compliance and treat PEs as just text macros. Separating declaration and use is a powerful representation technique: it's the theory behind macros, and for that matter, what database wonks call normalization. To argue that we don't *need* declarations - all in the name of avoiding an "ugly syntax" by reputation - is silly. > No doubt writing a XML is much much much easier than writing an SGML > paser. But still. How many lines does it take to incoperate nsgmls into > your code? Wrong candidate: nsgmls is a validating parser. Its error recovery heuristics leave a lot to be desired. Also, it's sensitive to "errors by fiat" - errors only because ISO 8879 says so, which any non-validating parser is not only likely to tolerate but will tolerate precisely in order to simplify the implementation. Until the XML initiative came along, there was no effort to define (or even organize) the various "rational subsets" of SGML floating around. However, XML isn't the only answer. Arjun
Received on Tuesday, 12 October 1999 18:45:01 UTC