- From: Gary Deschaines <gary.deschaines@netmechanic.com>
- Date: Fri, 11 Aug 2000 10:24:43 -0500
- To: html-tidy@w3.org
> dt/center processing problem fix
>
> From: Gary Deschaines (gary.deschaines@netmechanic.com)
> Date: Thu, Aug 10 2000
>
> *Next message: Andy Quick: "Re: Bug: Possible dangling pointer in istack.c"
>
> * Previous message: Sebastian Lange: "RE: tidy4aug00 update"
> * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Other mail archives: [this mailing list] [other W3C mailing lists]
> * Mail actions: [ respond to this message ] [ mail a new topic ]
>
> ------------------------------------------------------------------------
>
> Date: Thu, 10 Aug 2000 14:24:04 -0400 (EDT)
> Message-ID: <3992F05A.31F854DA@netmechanic.com>
> From: Gary Deschaines <gary.deschaines@netmechanic.com>
> To: html-tidy@w3.org
> Subject: dt/center processing problem fix
>
> Dave,
>
> The 4 August 2000 and earlier versions of HTML Tidy contain a bug
> which causes a segmentation fault in the InsertNodeAfterElement
> procedure when the specified element does not contain a parent.
> This problem occurs when HTML Tidy attempts to parse an inferred
> definition list which contains a center element as illustrated
> in the following segment of HTML code.
>
> <BODY>
> <CENTER><H1>Heading 1</H1></CENTER>
> <DT><IMG src="redball.gif"><B>Term 1</B></DT>
> <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
> <CENTER><H1>Heading 2</H1></CENTER>
>
> This problem had been reported by Glenn Carroll as a "dt/center
> processing problem" in an e-mail dated Wed, Apr 19 2000, but I
> have found no record of a reported fix in the html-tidy@w3.org
> mail archive.
>
> By using the HTML source file and HTML Tidy configuration file
> presented in sections 1 and 2 of the text file attached with
> this letter, I traced the problem to the code block labeled
> "/* center in a dt or a dl breaks the dl list in two */" in the
> ParseDefList procedure (lines 1457 to 1475 in parser.c).
>
> 1457 /* center in a dt or a dl breaks the dl list in two */
> 1458 if (node->tag == tag_center)
> 1459 {
> 1460 if (list->content)
> 1461 InsertNodeAfterElement(list, node);
> 1462 else /* trim empty dl list */
> 1463 {
> 1464 InsertNodeBeforeElement(list, node);
> 1465 DiscardElement(list);
> 1466 }
> 1467
> 1468 /* and parse contents of center */
> 1469 ParseTag(lexer, node, mode);
> 1470
> 1471 /* now create a new dl element */
> 1472 list = InferredTag(lexer, "dl");
> 1473 InsertNodeAfterElement(node, list);
> 1474 continue;
> 1475 }
>
> In the code block, ParseTag is called for the <CENTER> node
> following the first set of <DT> elements which are not contained
> in a <DL>...</DL> element. When the <H1> node immediately after
> the <CENTER> node is encountered by the ParseBlock procedure
> (ParseTag procedure for center tag), the <CENTER> element is
> discarded by the following block of code (lines 765 to 781 of
> parser.c)
>
> 765 else if (node->tag->model & CM_BLOCK)
> 766 {
> 767 if (lexer->excludeBlocks)
> 768 {
> 769 if (!(element->tag->model & CM_OPT))
> 770 ReportWarning(lexer, element, node,
> MISSING_ENDTAG_BEFORE);
> 771
> 772 UngetToken(lexer);
> 773
> 774 if (element->tag->model & CM_OBJECT)
> 775 lexer->istackbase = istackbase;
> 776
> 777 TrimSpaces(lexer, element);
> 778 TrimEmptyElement(lexer, element);
> 779 return;
> 780 }
> 781 }
>
> extracted from ParseBlock since the value of lexer->excludeBlocks
> is true. When processing returns from the ParseBlock (ParseTag)
> procedure, the center element has been discarded and the center
> element "node" passed in the call to InsertNodeAfterElement for
> the inferred dl element "list" does not contain a valid pointer
> to a parent node.
>
> The occurrence of a center element in the definition list results
> in the definition list to be split into two lists around the center
> element. Consequently, the center element is no longer contained
> in a definition list and block elements are permitted. Therefore,
> based on my interpretation of HTML Tidy processing in this case, I
> believe the lexer->excludeBlocks flag needs to be set to no before
> the center node is parsed and then set to yes before ParseDefList
> processing continues with a new definition list as illustrated
> below.
>
> 1466 }
> 1467
> 1468 /* and parse contents of center */
> + lexer->excludeBlocks = no;
> 1469 ParseTag(lexer, node, mode);
> + lexer->excludeBlocks = yes;
> 1470
> 1471 /* now create a new dl element */
>
> The text file "INFO_1.txt" provided as an attachment with this
> letter contains the following sections which present information
> to substantiate my findings.
>
> 1. HTML Source File - coredump2_O.htm
> 2. HTML Tidy Configuration File - coredump2.cfg
> 3. Original HTML Tidy Execution
> 4. Examination of Core Dump with gdb
> 5. HTML Tidy Source Patches
> 6. Patched HTML Tidy Execution
>
> The HTML source file contains a condensed portion of an actual
> web page which caused the segmentation fault and incorporates the
> same HTML coding errors -- missing <DL> and </DL> tags, needless
> </DT> tags, missing <DD> tags, and incorrect use of UL tags
> instead of DL tags. I presume the web page author intended to
> use a definition list to create custom bullets for an unordered
> list instead of utilizing CSS to define a list-style-image property
> for unordered list elements.
>
> Respectfully,
> Gary Deschaines
> gary.deschaines@netmechanic.com
>
> FILE: INFO_1.txt (attachment to MEMO_1.txt)
> DATE: 10 AUG 2000
>
> -------------------------------------
> 1. HTML Source File - coredump2_O.htm
> -------------------------------------
> <HTML>
> <HEAD>
> <TITLE>Core Dump Case 2</TITLE>
> </HEAD>
> <BODY>
> <CENTER><H1>Heading 1</H1></CENTER>
> <DT><IMG src="redball.gif"><B>Term 1</B></DT>
> <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
> <CENTER><H1>Heading 2</H1></CENTER>
> <UL>
> <DT><IMG src="redball.gif"><B>Term 3</B></DT>
> <DT><IMG src="redball.gif"><B>Term 4</B><HR></DT>
> </UL>
> </BODY>
> </HTML>
>
> -----------------------------------------------
> 2. HTML Tidy Configuration File - coredump2.cfg
> -----------------------------------------------
> write-back: no
> tidy-mark: no
> quote-ampersand: no
> show-warnings: yes
> char-encoding: raw
> markup: yes
> show-acc-warnings: no
> hide-endtags: no
> uppercase-tags: no
> uppercase-attributes: no
> wrap-script-literals: no
> numeric-entities: no
> indent: auto
> wrap: 0
> logical-emphasis: no
> clean: no
> drop-font-tags: no
>
> -------------------------------
> 3. Original HTML Tidy Execution
> -------------------------------
> ../orig/tidy -e -config coredump2.cfg coredump2_O.htm
>
> Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
> line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
> line 7 column 4 - Warning: inserting implicit <dl>
> line 7 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 8 column 48 - Warning: trimming empty <dt>
> line 9 column 11 - Warning: missing </center> before <h1>
> line 9 column 11 - Warning: trimming empty <center>
> Segmentation fault (core dumped)
>
> ------------------------------------
> 4. Examination of Core Dump with gdb
> ------------------------------------
> gdb -nx ../orig/tidy -c core
>
> GNU gdb 19991004
> Copyright 1998 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i386-redhat-linux"...
> Core was generated by `../orig/tidy -e -config coredump2.cfg coredump2_O.htm'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libc.so.6...done.
> Reading symbols from /lib/ld-linux.so.2...done.
> #0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
> 205 if (parent->last == element)
>
> (gdb) where
>
> #0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
> #1 0x804cc13 in ParseDefList (lexer=0x806f2c0, list=0x8071a88, mode=0) at parser.c:1473
> #2 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x8071640, mode=0) at parser.c:432
> #3 0x804f4ef in ParseBody (lexer=0x806f2c0, body=0x80714c0, mode=0) at parser.c:2883
> #4 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x80714c0, mode=0) at parser.c:432
> #5 0x804fe81 in ParseHTML (lexer=0x806f2c0, html=0x8071390, mode=0) at parser.c:3217
> #6 0x804ffb9 in ParseDocument (lexer=0x806f2c0) at parser.c:3264
> #7 0x80604a4 in main (argc=2, argv=0xbffff9f0) at tidy.c:956
>
> (gdb) l 205
>
> 200 Node *parent;
> 201
> 202 parent = element->parent;
> 203 node->parent = parent;
> 204
> 205 if (parent->last == element)
> 206 parent->last = node;
> 207 else
> 208 {
> 209 node->next = element->next;
>
> (gdb) p element->parent
>
> $1 = (struct _node *) 0x0
>
> ---------------------------
> 5. HTML Tidy Source Patches
> ---------------------------
> *** ./orig/parser.c Fri Aug 4 12:21:05 2000
> --- ./code/parser.c Thu Aug 10 09:27:27 2000
> ***************
> *** 1466,1472 ****
> --- 1466,1474 ----
> }
>
> /* and parse contents of center */
> + lexer->excludeBlocks = no;
> ParseTag(lexer, node, mode);
> + lexer->excludeBlocks = yes;
>
> /* now create a new dl element */
> list = InferredTag(lexer, "dl");
>
> ------------------------------
> 6. Patched HTML Tidy Execution
> ------------------------------
> ../code/tidy -e -config coredump2.cfg coredump2_O.htm
>
> Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
> line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
> line 7 column 4 - Warning: inserting implicit <dl>
> line 7 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 8 - Warning: <img> lacks "alt" attribute
> line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 8 column 48 - Warning: trimming empty <dt>
> line 10 column 3 - Warning: trimming empty <dl>
> line 11 column 4 - Warning: missing <li>
> line 11 column 4 - Warning: inserting implicit <dl>
> line 11 column 8 - Warning: <img> lacks "alt" attribute
> line 12 column 8 - Warning: <img> lacks "alt" attribute
> line 12 column 44 - Warning: <hr> isn't allowed in <dt> elements
> line 12 column 48 - Warning: trimming empty <dt>
> line 13 column 3 - Warning: missing </dl> before </ul>
>
> coredump2_O.htm: Document content looks like HTML 3.2
> 14 warnings/errors were found!
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <html>
> <head>
> <title>Core Dump Case 2</title>
> </head>
>
> <body>
> <center>
> <h1>Heading 1</h1>
> </center>
>
> <dl>
> <dt><img src="redball.gif"><b>Term 1</b></dt>
>
> <dt><img src="redball.gif"><b>Term 2</b></dt>
>
> <dd>
> <hr>
> </dd>
> </dl>
>
> <center>
> <h1>Heading 2</h1>
> </center>
>
> <div style="margin-left: 2em">
> <dl>
> <dt><img src="redball.gif"><b>Term 3</b></dt>
>
> <dt><img src="redball.gif"><b>Term 4</b></dt>
>
> <dd>
> <hr>
> </dd>
> </dl>
> </div>
> </body>
> </html>
>
> The alt attribute should be used to give a short description
> of an image; longer descriptions should be given with the
> longdesc attribute which takes a URL linked to the description.
> These measures are needed for people using non-graphical browsers.
>
> For further advice on how to make your pages accessible
> see "http://www.w3.org/WAI/GL". You may also want to try
> "http://www.cast.org/bobby/" which is a free Web-based
> service for checking URLs for accessibility.
>
> HTML & CSS specifications are available from http://www.w3.org/
> To learn more about Tidy see http://www.w3.org/People/Raggett/tidy/
> Please send bug reports to Dave Raggett care of <html-tidy@w3.org>
> Lobby your company to join W3C, see http://www.w3.org/Consortium
>
> ------------------------------------------------------------------------
>
> * Next message: Andy Quick: "Re: Bug: Possible dangling pointer in istack.c"
> * Previous message: Sebastian Lange: "RE: tidy4aug00 update"
> * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Other mail archives: [this mailing list] [other W3C mailing lists]
> * Mail actions: [ respond to this message ] [ mail a new topic ]
BTW. The code patch presented in Glenn Carroll's e-mail must also be incorporated to account
for empty center element discarded by ParseTag processing. Specifically, the modifications to
the ParseDefList procedure (lines 1457 to 1475 in parser.c) should be:
1457 /* center in a dt or a dl breaks the dl list in two */
1458 if (node->tag == tag_center)
1459 {
1460 if (list->content)
1461 InsertNodeAfterElement(list, node);
1462 else /* trim empty dl list */
1463 {
1464 InsertNodeBeforeElement(list, node);
1465 DiscardElement(list);
1466 }
1467
+ /* ParseTag can destroy node, if it finds that
+ * this <center> is followed immediately by </center>.
+ * It's awkward but necessary to determine if this
+ * has happened.
+ */
+ parent = node->parent;
+
1468 /* and parse contents of center */
+ lexer->excludeBlocks = no;
1469 ParseTag(lexer, node, mode);
+ lexer->excludeBlocks = yes;
1470
1471C /* now create a new dl element,
+ * unless node has been blown away because the
+ * center was empty, as above.
+ */
+ if (parent->last == node)
+ {
1472C list = InferredTag(lexer, "dl");
1473C InsertNodeAfterElement(node, list);
+ }
1474 continue;
1475 }
Gary
Received on Friday, 11 August 2000 11:37:06 UTC