- From: Markku Savela <msa@msa.tte.vtt.fi>
- Date: Thu, 12 Dec 1996 12:39:41 +0200 (EET)
- To: www-lib@w3.org
- CC: msa@hemuli.tte.vtt.fi
Hi, About a month ago I posted a replacement module for the wwwlib SGML.c module with some changes, most important being - simplified into "HTML" tokenizer, no error checking at this level - skips "HTML comments" correctly, those nasty little things like <!-- <comment> -- something "<--else-->" -- "comment --> - passes text upstream in bigger chunks (instead of one char at time). Now, I found a bug in the implementation of the last point. The parser may pass the same text chunk twice to upstream, if there is "buffer break" just in front of entity start (&) or tag start (<). I guess this doesn't happen very often as I noticed it only after using it for a month... :-) So, just in case anyone else is using this module, the fix is to add some zeroing to the 'count' variables into proper places. Here is the diff against the version I posted earlier. *** SGML.c.orig Thu Dec 12 12:07:21 1996 --- SGML.c Thu Dec 12 12:11:29 1996 *************** *** 333,338 **** --- 333,339 ---- { if (count > 0) PUTB(text, count); + count = 0; string->size = 0; context->state = S_ero; } *************** *** 340,345 **** --- 341,347 ---- { if (count > 0) PUTB(text, count); + count = 0; string->size = 0; /* should scrap LITERAL, and use CDATA and RCDATA -- msa */ *************** *** 359,364 **** --- 361,367 ---- { if (count > 0) PUTB(text, count); + count = 0; string->size = 0; context->state = (context->contents == SGML_LITERAL) ? -- Markku Savela (msa@hemuli.tte.vtt.fi), Technical Research Centre of Finland Multimedia Systems, P.O.Box 1203,FIN-02044 VTT,http://www.vtt.fi/tte/staff/msa/
Received on Thursday, 12 December 1996 05:39:49 UTC