- From: Markku Savela <msa@msa.tte.vtt.fi>
- Date: Mon, 15 Jan 1996 16:42:10 +0200 (EET)
- To: www-lib@w3.org
- Cc: msa@msa.tte.vtt.fi, mjk@hemuli.tte.vtt.fi
I have started experimenting with the SGML (HTML DTD) parser provided by the libwww 4.0B version. I am using my own structured stream, but was hoping to be able to use the HTMLPDTD.* in the library. It seems that for many of the HTML tags, I get only call to the "start_element", but not to the "end_element", unless the HTML explicitly includes it. And even more, if I have "<P> ... </P>", I get begin_element, but </P> is totally ignored. I can see the "why" from { "P" , l_attr, HTML_L_ATTRIBUTES, SGML_EMPTY }, but, I am wondering, shouldn't <P> already be a "container", that is, SGML_MIXED. Similar question arises from some other tags, such as "LI" (we can have nested lists). I guess this all comes the fact that the library does not really have full SGML parser, and the HTMLPDTD does not really define the full "DTD". It seems that large part of the DTD structuring rules (which tags are allowed within and after which tag) must be implemented in the start_element/end_element calls. The question? I am wondering if there should be a structured stream very much like what you get with SGML + HTMLPDTD.c combination, but which would provide the missing rules and application could rely on getting *all* end_element calls, whether original HTML had them or not? For example, the coding <UL> <LI> text <LI> <UL> <LI> <P>text</P><P>text</P> </UL> </UL> would instead of begin UL begin LI begin LI begin UL begin LI begin P begin P end UL end UL give begin UL begin LI end LI begin LI begin UL begin LI begin P end P begin P end P end LI end UL end LI end UL With this, at least everyone would consistently agree what tag implicitly ends what. Or, is there such already in the library (as far as I can see the HText module goes much further, already interprets more than some might want...). Ps. I am not on this list (www-lib), CC any possible replies to me -- Markku Savela (msa@hemuli.tte.vtt.fi), Technical Research Centre of Finland Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/
Received on Monday, 15 January 1996 09:44:57 UTC