- From: Carl Morris <msftrncs@htcnet.com>
- Date: Fri, 20 Sep 1996 17:08:38 -0500
- To: "Murray Altheim" <murray@spyglass.com>
- Cc: "WWW HTML List" <www-html@w3.org>
| I've written partial parsers in both C and HyperTalk (I produce an HTML | editor), so I know all about parsing HTML. Yes, parsing is complex, but no | more so in SGML than any other programming or markup language. You surely | haven't looking much at RTF or PostScript. Yes, I have actually STARTED an RTF decoder, simple no doubt, but I think RTF is pretty straight forward until you get to the customizibility of itself... But I think if I followed all these SGML rules in writing a simple parser I would end up with a pretty complex program that would in some instances be harder to describe... Try describing this stuff in english once... An HTML comment starts with "<!--" contains any amount and kind of data and ends with the first sequence of "-->" is how it really should be said... but no, we'll allow people to add space here and here, and, what I hate about this is, that some how the -- doesn't have to be linked with the closing ">"... Why? Is this because: <!-- put comment first --doctype ...> would be valid? What can possible come after the -- but before the ">" that should require a parser find the first -- as the end of the comment? What should a parser do with the rest when it does have no use for it (the stuff between the -- that ends the comment and the ">"...? Those are question I have to ask... and second, can my implementation of <![whitespace?}--[anything up to]--[whitespace]> be considered wrong when its only for use in parsing HTML comments? Technically, for a very limited parser, it would be probably best to dump the <! as an unreckognized tag... would there be any reason why I should instead think about processing it? BTW, I am sending this back to the list... for others opinions too... I may get back to that parser some day...
Received on Friday, 20 September 1996 18:08:57 UTC