Re: dt/center processing problem fix

> dt/center processing problem fix
>
> From: Gary Deschaines (gary.deschaines@netmechanic.com)
> Date: Thu, Aug 10 2000
>
> *Next message: Andy Quick: "Re: Bug: Possible dangling pointer in istack.c"
>
>    * Previous message: Sebastian Lange: "RE: tidy4aug00 update"
>    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>    * Other mail archives: [this mailing list] [other W3C mailing lists]
>    * Mail actions: [ respond to this message ] [ mail a new topic ]
>
>   ------------------------------------------------------------------------
>
> Date: Thu, 10 Aug 2000 14:24:04 -0400 (EDT)
> Message-ID: <3992F05A.31F854DA@netmechanic.com>
> From: Gary Deschaines <gary.deschaines@netmechanic.com>
> To: html-tidy@w3.org
> Subject: dt/center processing problem fix
>
> Dave,
>
> The 4 August 2000 and earlier versions of HTML Tidy contain a bug
> which causes a segmentation fault in the InsertNodeAfterElement
> procedure when the specified element does not contain a parent.
> This problem occurs when HTML Tidy attempts to parse an inferred
> definition list which contains a center element as illustrated
> in the following segment of HTML code.
>
>   <BODY>
>    <CENTER><H1>Heading 1</H1></CENTER>
>     <DT><IMG src="redball.gif"><B>Term 1</B></DT>
>     <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
>    <CENTER><H1>Heading 2</H1></CENTER>
>
> This problem had been reported by Glenn Carroll as a "dt/center
> processing problem" in an e-mail dated Wed, Apr 19 2000, but I
> have found no record of a reported fix in the html-tidy@w3.org
> mail archive.
>
> By using the HTML source file and HTML Tidy configuration file
> presented in sections 1 and 2 of the text file attached with
> this letter, I traced the problem to the code block labeled
> "/* center in a dt or a dl breaks the dl list in two */" in the
> ParseDefList procedure (lines 1457 to 1475 in parser.c).
>
>   1457         /* center in a dt or a dl breaks the dl list in two */
>   1458         if (node->tag == tag_center)
>   1459         {
>   1460             if (list->content)
>   1461                 InsertNodeAfterElement(list, node);
>   1462             else /* trim empty dl list */
>   1463             {
>   1464                 InsertNodeBeforeElement(list, node);
>   1465                 DiscardElement(list);
>   1466             }
>   1467
>   1468             /* and parse contents of center */
>   1469             ParseTag(lexer, node, mode);
>   1470
>   1471             /* now create a new dl element */
>   1472             list = InferredTag(lexer, "dl");
>   1473             InsertNodeAfterElement(node, list);
>   1474             continue;
>   1475         }
>
> In the code block, ParseTag is called for the <CENTER> node
> following the first set of <DT> elements which are not contained
> in a <DL>...</DL> element.  When the <H1> node immediately after
> the <CENTER> node is encountered by the ParseBlock procedure
> (ParseTag procedure for center tag), the <CENTER> element is
> discarded by the following block of code (lines 765 to 781 of
> parser.c)
>
>    765             else if (node->tag->model & CM_BLOCK)
>    766             {
>    767                 if (lexer->excludeBlocks)
>    768                 {
>    769                     if (!(element->tag->model & CM_OPT))
>    770                         ReportWarning(lexer, element, node,
>                                              MISSING_ENDTAG_BEFORE);
>    771
>    772                     UngetToken(lexer);
>    773
>    774                     if (element->tag->model & CM_OBJECT)
>    775                         lexer->istackbase = istackbase;
>    776
>    777                     TrimSpaces(lexer, element);
>    778                     TrimEmptyElement(lexer, element);
>    779                     return;
>    780                 }
>    781             }
>
> extracted from ParseBlock since the value of lexer->excludeBlocks
> is true.  When processing returns from the ParseBlock (ParseTag)
> procedure, the center element has been discarded and the center
> element "node" passed in the call to InsertNodeAfterElement for
> the inferred dl element "list" does not contain a valid pointer
> to a parent node.
>
> The occurrence of a center element in the definition list results
> in the definition list to be split into two lists around the center
> element.  Consequently, the center element is no longer contained
> in a definition list and block elements are permitted.  Therefore,
> based on my interpretation of HTML Tidy processing in this case, I
> believe the lexer->excludeBlocks flag needs to be set to no before
> the center node is parsed and then set to yes before ParseDefList
> processing continues with a new definition list as illustrated
> below.
>
>   1466             }
>   1467
>   1468             /* and parse contents of center */
> +                  lexer->excludeBlocks = no;
>   1469             ParseTag(lexer, node, mode);
> +                  lexer->excludeBlocks = yes;
>   1470
>   1471             /* now create a new dl element */
>
> The text file "INFO_1.txt" provided as an attachment with this
> letter contains the following sections which present information
> to substantiate my findings.
>
>   1. HTML Source File - coredump2_O.htm
>   2. HTML Tidy Configuration File - coredump2.cfg
>   3. Original HTML Tidy Execution
>   4. Examination of Core Dump with gdb
>   5. HTML Tidy Source Patches
>   6. Patched HTML Tidy Execution
>
> The HTML source file contains a condensed portion of an actual
> web page which caused the segmentation fault and incorporates the
> same HTML coding errors -- missing <DL> and </DL> tags,  needless
> </DT> tags, missing <DD> tags, and incorrect use of UL tags
> instead of DL tags.  I presume the web page author intended to
> use a definition list to create custom bullets for an unordered
> list instead of utilizing CSS to define a list-style-image property
> for unordered list elements.
>
> Respectfully,
> Gary Deschaines
> gary.deschaines@netmechanic.com
>
> FILE:  INFO_1.txt (attachment to MEMO_1.txt)
> DATE:  10 AUG 2000
>
> -------------------------------------
> 1. HTML Source File - coredump2_O.htm
> -------------------------------------
> <HTML>
>  <HEAD>
>   <TITLE>Core Dump Case 2</TITLE>
>  </HEAD>
>  <BODY>
>   <CENTER><H1>Heading 1</H1></CENTER>
>    <DT><IMG src="redball.gif"><B>Term 1</B></DT>
>    <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
>   <CENTER><H1>Heading 2</H1></CENTER>
>   <UL>
>    <DT><IMG src="redball.gif"><B>Term 3</B></DT>
>    <DT><IMG src="redball.gif"><B>Term 4</B><HR></DT>
>   </UL>
>  </BODY>
> </HTML>
>
> -----------------------------------------------
> 2. HTML Tidy Configuration File - coredump2.cfg
> -----------------------------------------------
> write-back: no
> tidy-mark: no
> quote-ampersand: no
> show-warnings: yes
> char-encoding: raw
> markup: yes
> show-acc-warnings: no
> hide-endtags: no
> uppercase-tags: no
> uppercase-attributes: no
> wrap-script-literals: no
> numeric-entities: no
> indent: auto
> wrap: 0
> logical-emphasis: no
> clean: no
> drop-font-tags: no
>
> -------------------------------
> 3. Original HTML Tidy Execution
> -------------------------------
> ../orig/tidy -e -config coredump2.cfg coredump2_O.htm
>
> Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
> line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
> line 7 column 4 - Warning: inserting implicit <dl>
> line 7 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 8 column 48 - Warning: trimming empty <dt>
> line 9 column 11 - Warning: missing </center> before <h1>
> line 9 column 11 - Warning: trimming empty <center>
> Segmentation fault (core dumped)
>
> ------------------------------------
> 4. Examination of Core Dump with gdb
> ------------------------------------
> gdb -nx ../orig/tidy -c core
>
> GNU gdb 19991004
> Copyright 1998 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-redhat-linux"...
> Core was generated by `../orig/tidy -e -config coredump2.cfg coredump2_O.htm'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libc.so.6...done.
> Reading symbols from /lib/ld-linux.so.2...done.
> #0  0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
> 205         if (parent->last == element)
>
> (gdb) where
>
> #0  0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
> #1  0x804cc13 in ParseDefList (lexer=0x806f2c0, list=0x8071a88, mode=0) at parser.c:1473
> #2  0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x8071640, mode=0) at parser.c:432
> #3  0x804f4ef in ParseBody (lexer=0x806f2c0, body=0x80714c0, mode=0) at parser.c:2883
> #4  0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x80714c0, mode=0) at parser.c:432
> #5  0x804fe81 in ParseHTML (lexer=0x806f2c0, html=0x8071390, mode=0) at parser.c:3217
> #6  0x804ffb9 in ParseDocument (lexer=0x806f2c0) at parser.c:3264
> #7  0x80604a4 in main (argc=2, argv=0xbffff9f0) at tidy.c:956
>
> (gdb) l 205
>
> 200         Node *parent;
> 201
> 202         parent = element->parent;
> 203         node->parent = parent;
> 204
> 205         if (parent->last == element)
> 206             parent->last = node;
> 207         else
> 208         {
> 209             node->next = element->next;
>
> (gdb) p element->parent
>
> $1 = (struct _node *) 0x0
>
> ---------------------------
> 5. HTML Tidy Source Patches
> ---------------------------
> *** ./orig/parser.c     Fri Aug  4 12:21:05 2000
> --- ./code/parser.c     Thu Aug 10 09:27:27 2000
> ***************
> *** 1466,1472 ****
> --- 1466,1474 ----
>               }
>
>               /* and parse contents of center */
> +             lexer->excludeBlocks = no;
>               ParseTag(lexer, node, mode);
> +             lexer->excludeBlocks = yes;
>
>               /* now create a new dl element */
>               list = InferredTag(lexer, "dl");
>
> ------------------------------
> 6. Patched HTML Tidy Execution
> ------------------------------
> ../code/tidy -e -config coredump2.cfg coredump2_O.htm
>
> Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
> line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
> line 7 column 4 - Warning: inserting implicit <dl>
> line 7 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 8 column 48 - Warning: trimming empty <dt>
> line 10 column 3 - Warning: trimming empty <dl>
> line 11 column 4 - Warning: missing <li>
> line 11 column 4 - Warning: inserting implicit <dl>
> line 11 column 8 - Warning: <img> lacks "alt" attribute
> line 12 column 8 - Warning: <img> lacks "alt" attribute
> line 12 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 12 column 48 - Warning: trimming empty <dt>
> line 13 column 3 - Warning: missing </dl> before </ul>
>
> coredump2_O.htm: Document content looks like HTML 3.2
> 14 warnings/errors were found!
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <html>
>   <head>
>     <title>Core Dump Case 2</title>
>   </head>
>
>   <body>
>     <center>
>       <h1>Heading 1</h1>
>     </center>
>
>     <dl>
>       <dt><img src="redball.gif"><b>Term 1</b></dt>
>
>       <dt><img src="redball.gif"><b>Term 2</b></dt>
>
>       <dd>
>         <hr>
>       </dd>
>     </dl>
>
>     <center>
>       <h1>Heading 2</h1>
>     </center>
>
>     <div style="margin-left: 2em">
>       <dl>
>         <dt><img src="redball.gif"><b>Term 3</b></dt>
>
>         <dt><img src="redball.gif"><b>Term 4</b></dt>
>
>         <dd>
>           <hr>
>         </dd>
>       </dl>
>     </div>
>   </body>
> </html>
>
> The alt attribute should be used to give a short description
> of an image; longer descriptions should be given with the
> longdesc attribute which takes a URL linked to the description.
> These measures are needed for people using non-graphical browsers.
>
> For further advice on how to make your pages accessible
> see "http://www.w3.org/WAI/GL". You may also want to try
> "http://www.cast.org/bobby/" which is a free Web-based
> service for checking URLs for accessibility.
>
> HTML & CSS specifications are available from http://www.w3.org/
> To learn more about Tidy see http://www.w3.org/People/Raggett/tidy/
> Please send bug reports to Dave Raggett care of <html-tidy@w3.org>
> Lobby your company to join W3C, see http://www.w3.org/Consortium
>
>   ------------------------------------------------------------------------
>
>    * Next message: Andy Quick: "Re: Bug: Possible dangling pointer in istack.c"
>    * Previous message: Sebastian Lange: "RE: tidy4aug00 update"
>    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>    * Other mail archives: [this mailing list] [other W3C mailing lists]
>    * Mail actions: [ respond to this message ] [ mail a new topic ]

BTW.  The code patch presented in Glenn Carroll's e-mail must also be incorporated to account
for empty center element discarded by ParseTag processing.  Specifically, the modifications to

the ParseDefList procedure (lines 1457 to 1475 in parser.c) should be:

  1457         /* center in a dt or a dl breaks the dl list in two */
  1458         if (node->tag == tag_center)
  1459         {
  1460             if (list->content)
  1461                 InsertNodeAfterElement(list, node);
  1462             else /* trim empty dl list */
  1463             {
  1464                 InsertNodeBeforeElement(list, node);
  1465                 DiscardElement(list);
  1466             }
  1467
  +                  /* ParseTag can destroy node, if it finds that
  +                   * this <center> is followed immediately by </center>.
  +                   * It's awkward but necessary to determine if this
  +                   * has happened.
  +                   */
  +                   parent = node->parent;
  +
  1468             /* and parse contents of center */
  +                   lexer->excludeBlocks = no;
  1469             ParseTag(lexer, node, mode);
  +                   lexer->excludeBlocks = yes;
  1470
  1471C           /* now create a new dl element,
  +                    * unless node has been blown away because the
  +                    * center was empty, as above.
  +                    */
  +                   if (parent->last == node)
  +                   {
  1472C                list = InferredTag(lexer, "dl");
  1473C                InsertNodeAfterElement(node, list);
  +                   }
  1474             continue;
  1475         }

Gary

Received on Friday, 11 August 2000 11:37:06 UTC