Re: Simple(?) question on obscure comments detail

| I've written partial parsers in both C and HyperTalk (I produce an
HTML
| editor), so I know all about parsing HTML. Yes, parsing is complex,
but no
| more so in SGML than any other programming or markup language. You
surely
| haven't looking much at RTF or PostScript.

Yes, I have actually STARTED an RTF decoder, simple no doubt, but I
think RTF is pretty straight forward until you get to the
customizibility of itself...  But I think if I followed all these SGML
rules in writing a simple parser I would end up with a pretty complex
program that would in some instances be harder to describe...

Try describing this stuff in english once...

An HTML comment starts with "<!--" contains any amount and kind of data
and ends with the first sequence of "-->" is how it really should be
said...  but no, we'll allow people to add space here and here, and,
what I hate about this is, that some how the -- doesn't have to be
linked with the closing ">"...  Why?  Is this because:

<!-- put comment first --doctype ...>

would be valid?  What can possible come after the -- but before the 
">" that should require a parser find the first -- as the end of the
comment?  What should a parser do with the rest when it does have no
use for it (the stuff between the -- that ends the comment and the
">"...?  Those are question I have to ask...

and second, can my implementation of

<![whitespace?}--[anything up to]--[whitespace]>

be considered wrong when its only for use in parsing HTML comments? 
Technically, for a very limited parser, it would be probably best to
dump the <! as an unreckognized tag... would there be any reason why I
should instead think about processing it?

BTW, I am sending this back to the list... for others opinions too... 
I may get back to that parser some day...

Received on Friday, 20 September 1996 18:08:57 UTC