- From: Kim Man Liu <kliu@us.oracle.com>
- Date: Tue, 30 Jul 1996 16:04:22 -0700
- To: www-lib@w3.org
I understand that the SGML/HTML/HText module is not well designed and it is going to be replaced. But if the replacement would be ready tomorrow, I wouldn't be asking this question. I was wondering if anyone could share his/her experience in modifying the SGML "parser" so that it generates a tree-like structure of tags instead of a stream of tags. More specifically, it has to do with the start_element() and end_element() functions in SGML.c. Currently, it tries to do some ad-hoc tag matching. I'm not an SGML expert but I guess the right way to do it is something like this: When you see a start tag, you check the DTD to see if this tag is allowed in the enclosing tag. If it is allowed, then obviously you push this tag onto the tag-stack and go on. If it is not allowed, then the situation is more complex. It might be allowed inside the enclosing tag of the immediate enclosing tag. This way you can assume the immediate enclosing tag is closed and pop it off the stack and push in the new tag (it can be an error if explicit end tag is required for the immediate enclosing tag). You might have to pop all the way up to the HTML tag if a tag occurs in a very wrong place. Or you can do intelligent error recovery to handle this. Is this a reasonable way to parse SGML? Has anyone done something like this? -Kim
Received on Tuesday, 30 July 1996 19:05:44 UTC