[Prev][Next][Index][Thread]

Alternate SGML.c module in wwwlib 5.0a...?



Hi,

About a month ago I posted a replacement module for the wwwlib SGML.c
module with some changes, most important being

- simplified into "HTML" tokenizer, no error checking at this level

- skips "HTML comments" correctly, those nasty little things like

	<!-- <comment> -- something "<--else-->" --  "comment -->

- passes text upstream in bigger chunks (instead of one char at time).


Now, I found a bug in the implementation of the last point. The parser
may pass the same text chunk twice to upstream, if there is "buffer
break" just in front of entity start (&) or tag start (<). I guess
this doesn't happen very often as I noticed it only after using it for
a month... :-)

So, just in case anyone else is using this module, the fix is to add
some zeroing to the 'count' variables into proper places. Here is the
diff against the version I posted earlier.


*** SGML.c.orig	Thu Dec 12 12:07:21 1996
--- SGML.c	Thu Dec 12 12:11:29 1996
***************
*** 333,338 ****
--- 333,339 ----
  			    {
  				if (count > 0)
  					PUTB(text, count);
+ 				count = 0;
  				string->size = 0;
  				context->state = S_ero;
  			    }
***************
*** 340,345 ****
--- 341,347 ----
  			    {
  				if (count > 0)
  					PUTB(text, count);
+ 				count = 0;
  				string->size = 0;
  				/* should scrap LITERAL, and use CDATA and
  				   RCDATA -- msa */
***************
*** 359,364 ****
--- 361,367 ----
  			    {
  				if (count > 0)
  					PUTB(text, count);
+ 				count = 0;
  				string->size = 0;
  				context->state =
  					(context->contents == SGML_LITERAL) ?





--
Markku Savela (msa@hemuli.tte.vtt.fi),     Technical Research Centre of Finland
Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/


Follow-Ups: References: