- From: Gary Deschaines <gary.deschaines@netmechanic.com>
- Date: Thu, 10 Aug 2000 14:24:04 -0400 (EDT)
- To: html-tidy@w3.org
- Message-ID: <3992F05A.31F854DA@netmechanic.com>
Dave, The 4 August 2000 and earlier versions of HTML Tidy contain a bug which causes a segmentation fault in the InsertNodeAfterElement procedure when the specified element does not contain a parent. This problem occurs when HTML Tidy attempts to parse an inferred definition list which contains a center element as illustrated in the following segment of HTML code. <BODY> <CENTER><H1>Heading 1</H1></CENTER> <DT><IMG src="redball.gif"><B>Term 1</B></DT> <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT> <CENTER><H1>Heading 2</H1></CENTER> This problem had been reported by Glenn Carroll as a "dt/center processing problem" in an e-mail dated Wed, Apr 19 2000, but I have found no record of a reported fix in the html-tidy@w3.org mail archive. By using the HTML source file and HTML Tidy configuration file presented in sections 1 and 2 of the text file attached with this letter, I traced the problem to the code block labeled "/* center in a dt or a dl breaks the dl list in two */" in the ParseDefList procedure (lines 1457 to 1475 in parser.c). 1457 /* center in a dt or a dl breaks the dl list in two */ 1458 if (node->tag == tag_center) 1459 { 1460 if (list->content) 1461 InsertNodeAfterElement(list, node); 1462 else /* trim empty dl list */ 1463 { 1464 InsertNodeBeforeElement(list, node); 1465 DiscardElement(list); 1466 } 1467 1468 /* and parse contents of center */ 1469 ParseTag(lexer, node, mode); 1470 1471 /* now create a new dl element */ 1472 list = InferredTag(lexer, "dl"); 1473 InsertNodeAfterElement(node, list); 1474 continue; 1475 } In the code block, ParseTag is called for the <CENTER> node following the first set of <DT> elements which are not contained in a <DL>...</DL> element. When the <H1> node immediately after the <CENTER> node is encountered by the ParseBlock procedure (ParseTag procedure for center tag), the <CENTER> element is discarded by the following block of code (lines 765 to 781 of parser.c) 765 else if (node->tag->model & CM_BLOCK) 766 { 767 if (lexer->excludeBlocks) 768 { 769 if (!(element->tag->model & CM_OPT)) 770 ReportWarning(lexer, element, node, MISSING_ENDTAG_BEFORE); 771 772 UngetToken(lexer); 773 774 if (element->tag->model & CM_OBJECT) 775 lexer->istackbase = istackbase; 776 777 TrimSpaces(lexer, element); 778 TrimEmptyElement(lexer, element); 779 return; 780 } 781 } extracted from ParseBlock since the value of lexer->excludeBlocks is true. When processing returns from the ParseBlock (ParseTag) procedure, the center element has been discarded and the center element "node" passed in the call to InsertNodeAfterElement for the inferred dl element "list" does not contain a valid pointer to a parent node. The occurrence of a center element in the definition list results in the definition list to be split into two lists around the center element. Consequently, the center element is no longer contained in a definition list and block elements are permitted. Therefore, based on my interpretation of HTML Tidy processing in this case, I believe the lexer->excludeBlocks flag needs to be set to no before the center node is parsed and then set to yes before ParseDefList processing continues with a new definition list as illustrated below. 1466 } 1467 1468 /* and parse contents of center */ + lexer->excludeBlocks = no; 1469 ParseTag(lexer, node, mode); + lexer->excludeBlocks = yes; 1470 1471 /* now create a new dl element */ The text file "INFO_1.txt" provided as an attachment with this letter contains the following sections which present information to substantiate my findings. 1. HTML Source File - coredump2_O.htm 2. HTML Tidy Configuration File - coredump2.cfg 3. Original HTML Tidy Execution 4. Examination of Core Dump with gdb 5. HTML Tidy Source Patches 6. Patched HTML Tidy Execution The HTML source file contains a condensed portion of an actual web page which caused the segmentation fault and incorporates the same HTML coding errors -- missing <DL> and </DL> tags, needless </DT> tags, missing <DD> tags, and incorrect use of UL tags instead of DL tags. I presume the web page author intended to use a definition list to create custom bullets for an unordered list instead of utilizing CSS to define a list-style-image property for unordered list elements. Respectfully, Gary Deschaines gary.deschaines@netmechanic.com
FILE: INFO_1.txt (attachment to MEMO_1.txt) DATE: 10 AUG 2000 ------------------------------------- 1. HTML Source File - coredump2_O.htm ------------------------------------- <HTML> <HEAD> <TITLE>Core Dump Case 2</TITLE> </HEAD> <BODY> <CENTER><H1>Heading 1</H1></CENTER> <DT><IMG src="redball.gif"><B>Term 1</B></DT> <DT><IMG src="redball.gif"><B>Term 2</B><HR></DT> <CENTER><H1>Heading 2</H1></CENTER> <UL> <DT><IMG src="redball.gif"><B>Term 3</B></DT> <DT><IMG src="redball.gif"><B>Term 4</B><HR></DT> </UL> </BODY> </HTML> ----------------------------------------------- 2. HTML Tidy Configuration File - coredump2.cfg ----------------------------------------------- write-back: no tidy-mark: no quote-ampersand: no show-warnings: yes char-encoding: raw markup: yes show-acc-warnings: no hide-endtags: no uppercase-tags: no uppercase-attributes: no wrap-script-literals: no numeric-entities: no indent: auto wrap: 0 logical-emphasis: no clean: no drop-font-tags: no ------------------------------- 3. Original HTML Tidy Execution ------------------------------- ../orig/tidy -e -config coredump2.cfg coredump2_O.htm Tidy (vers 4th August 2000) Parsing "coredump2_O.htm" line 7 column 4 - Warning: <dt> isn't allowed in <body> elements line 7 column 4 - Warning: inserting implicit <dl> line 7 column 8 - Warning: <img> lacks "alt" attribute line 8 column 8 - Warning: <img> lacks "alt" attribute line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements line 8 column 48 - Warning: trimming empty <dt> line 9 column 11 - Warning: missing </center> before <h1> line 9 column 11 - Warning: trimming empty <center> Segmentation fault (core dumped) ------------------------------------ 4. Examination of Core Dump with gdb ------------------------------------ gdb -nx ../orig/tidy -c core GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Core was generated by `../orig/tidy -e -config coredump2.cfg coredump2_O.htm'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. #0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205 205 if (parent->last == element) (gdb) where #0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205 #1 0x804cc13 in ParseDefList (lexer=0x806f2c0, list=0x8071a88, mode=0) at parser.c:1473 #2 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x8071640, mode=0) at parser.c:432 #3 0x804f4ef in ParseBody (lexer=0x806f2c0, body=0x80714c0, mode=0) at parser.c:2883 #4 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x80714c0, mode=0) at parser.c:432 #5 0x804fe81 in ParseHTML (lexer=0x806f2c0, html=0x8071390, mode=0) at parser.c:3217 #6 0x804ffb9 in ParseDocument (lexer=0x806f2c0) at parser.c:3264 #7 0x80604a4 in main (argc=2, argv=0xbffff9f0) at tidy.c:956 (gdb) l 205 200 Node *parent; 201 202 parent = element->parent; 203 node->parent = parent; 204 205 if (parent->last == element) 206 parent->last = node; 207 else 208 { 209 node->next = element->next; (gdb) p element->parent $1 = (struct _node *) 0x0 --------------------------- 5. HTML Tidy Source Patches --------------------------- *** ./orig/parser.c Fri Aug 4 12:21:05 2000 --- ./code/parser.c Thu Aug 10 09:27:27 2000 *************** *** 1466,1472 **** --- 1466,1474 ---- } /* and parse contents of center */ + lexer->excludeBlocks = no; ParseTag(lexer, node, mode); + lexer->excludeBlocks = yes; /* now create a new dl element */ list = InferredTag(lexer, "dl"); ------------------------------ 6. Patched HTML Tidy Execution ------------------------------ ../code/tidy -e -config coredump2.cfg coredump2_O.htm Tidy (vers 4th August 2000) Parsing "coredump2_O.htm" line 7 column 4 - Warning: <dt> isn't allowed in <body> elements line 7 column 4 - Warning: inserting implicit <dl> line 7 column 8 - Warning: <img> lacks "alt" attribute line 8 column 8 - Warning: <img> lacks "alt" attribute line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements line 8 column 48 - Warning: trimming empty <dt> line 10 column 3 - Warning: trimming empty <dl> line 11 column 4 - Warning: missing <li> line 11 column 4 - Warning: inserting implicit <dl> line 11 column 8 - Warning: <img> lacks "alt" attribute line 12 column 8 - Warning: <img> lacks "alt" attribute line 12 column 44 - Warning: <hr> isn't allowed in <dt> elements line 12 column 48 - Warning: trimming empty <dt> line 13 column 3 - Warning: missing </dl> before </ul> coredump2_O.htm: Document content looks like HTML 3.2 14 warnings/errors were found! <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Core Dump Case 2</title> </head> <body> <center> <h1>Heading 1</h1> </center> <dl> <dt><img src="redball.gif"><b>Term 1</b></dt> <dt><img src="redball.gif"><b>Term 2</b></dt> <dd> <hr> </dd> </dl> <center> <h1>Heading 2</h1> </center> <div style="margin-left: 2em"> <dl> <dt><img src="redball.gif"><b>Term 3</b></dt> <dt><img src="redball.gif"><b>Term 4</b></dt> <dd> <hr> </dd> </dl> </div> </body> </html> The alt attribute should be used to give a short description of an image; longer descriptions should be given with the longdesc attribute which takes a URL linked to the description. These measures are needed for people using non-graphical browsers. For further advice on how to make your pages accessible see "http://www.w3.org/WAI/GL". You may also want to try "http://www.cast.org/bobby/" which is a free Web-based service for checking URLs for accessibility. HTML & CSS specifications are available from http://www.w3.org/ To learn more about Tidy see http://www.w3.org/People/Raggett/tidy/ Please send bug reports to Dave Raggett care of <html-tidy@w3.org> Lobby your company to join W3C, see http://www.w3.org/Consortium
Received on Thursday, 10 August 2000 14:24:44 UTC