- From: Gary Deschaines <gary.deschaines@netmechanic.com>
- Date: Thu, 10 Aug 2000 14:24:04 -0400 (EDT)
- To: html-tidy@w3.org
- Message-ID: <3992F05A.31F854DA@netmechanic.com>
Dave,
The 4 August 2000 and earlier versions of HTML Tidy contain a bug
which causes a segmentation fault in the InsertNodeAfterElement
procedure when the specified element does not contain a parent.
This problem occurs when HTML Tidy attempts to parse an inferred
definition list which contains a center element as illustrated
in the following segment of HTML code.
<BODY>
<CENTER><H1>Heading 1</H1></CENTER>
<DT><IMG src="redball.gif"><B>Term 1</B></DT>
<DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
<CENTER><H1>Heading 2</H1></CENTER>
This problem had been reported by Glenn Carroll as a "dt/center
processing problem" in an e-mail dated Wed, Apr 19 2000, but I
have found no record of a reported fix in the html-tidy@w3.org
mail archive.
By using the HTML source file and HTML Tidy configuration file
presented in sections 1 and 2 of the text file attached with
this letter, I traced the problem to the code block labeled
"/* center in a dt or a dl breaks the dl list in two */" in the
ParseDefList procedure (lines 1457 to 1475 in parser.c).
1457 /* center in a dt or a dl breaks the dl list in two */
1458 if (node->tag == tag_center)
1459 {
1460 if (list->content)
1461 InsertNodeAfterElement(list, node);
1462 else /* trim empty dl list */
1463 {
1464 InsertNodeBeforeElement(list, node);
1465 DiscardElement(list);
1466 }
1467
1468 /* and parse contents of center */
1469 ParseTag(lexer, node, mode);
1470
1471 /* now create a new dl element */
1472 list = InferredTag(lexer, "dl");
1473 InsertNodeAfterElement(node, list);
1474 continue;
1475 }
In the code block, ParseTag is called for the <CENTER> node
following the first set of <DT> elements which are not contained
in a <DL>...</DL> element. When the <H1> node immediately after
the <CENTER> node is encountered by the ParseBlock procedure
(ParseTag procedure for center tag), the <CENTER> element is
discarded by the following block of code (lines 765 to 781 of
parser.c)
765 else if (node->tag->model & CM_BLOCK)
766 {
767 if (lexer->excludeBlocks)
768 {
769 if (!(element->tag->model & CM_OPT))
770 ReportWarning(lexer, element, node,
MISSING_ENDTAG_BEFORE);
771
772 UngetToken(lexer);
773
774 if (element->tag->model & CM_OBJECT)
775 lexer->istackbase = istackbase;
776
777 TrimSpaces(lexer, element);
778 TrimEmptyElement(lexer, element);
779 return;
780 }
781 }
extracted from ParseBlock since the value of lexer->excludeBlocks
is true. When processing returns from the ParseBlock (ParseTag)
procedure, the center element has been discarded and the center
element "node" passed in the call to InsertNodeAfterElement for
the inferred dl element "list" does not contain a valid pointer
to a parent node.
The occurrence of a center element in the definition list results
in the definition list to be split into two lists around the center
element. Consequently, the center element is no longer contained
in a definition list and block elements are permitted. Therefore,
based on my interpretation of HTML Tidy processing in this case, I
believe the lexer->excludeBlocks flag needs to be set to no before
the center node is parsed and then set to yes before ParseDefList
processing continues with a new definition list as illustrated
below.
1466 }
1467
1468 /* and parse contents of center */
+ lexer->excludeBlocks = no;
1469 ParseTag(lexer, node, mode);
+ lexer->excludeBlocks = yes;
1470
1471 /* now create a new dl element */
The text file "INFO_1.txt" provided as an attachment with this
letter contains the following sections which present information
to substantiate my findings.
1. HTML Source File - coredump2_O.htm
2. HTML Tidy Configuration File - coredump2.cfg
3. Original HTML Tidy Execution
4. Examination of Core Dump with gdb
5. HTML Tidy Source Patches
6. Patched HTML Tidy Execution
The HTML source file contains a condensed portion of an actual
web page which caused the segmentation fault and incorporates the
same HTML coding errors -- missing <DL> and </DL> tags, needless
</DT> tags, missing <DD> tags, and incorrect use of UL tags
instead of DL tags. I presume the web page author intended to
use a definition list to create custom bullets for an unordered
list instead of utilizing CSS to define a list-style-image property
for unordered list elements.
Respectfully,
Gary Deschaines
gary.deschaines@netmechanic.com
FILE: INFO_1.txt (attachment to MEMO_1.txt)
DATE: 10 AUG 2000
-------------------------------------
1. HTML Source File - coredump2_O.htm
-------------------------------------
<HTML>
<HEAD>
<TITLE>Core Dump Case 2</TITLE>
</HEAD>
<BODY>
<CENTER><H1>Heading 1</H1></CENTER>
<DT><IMG src="redball.gif"><B>Term 1</B></DT>
<DT><IMG src="redball.gif"><B>Term 2</B><HR></DT>
<CENTER><H1>Heading 2</H1></CENTER>
<UL>
<DT><IMG src="redball.gif"><B>Term 3</B></DT>
<DT><IMG src="redball.gif"><B>Term 4</B><HR></DT>
</UL>
</BODY>
</HTML>
-----------------------------------------------
2. HTML Tidy Configuration File - coredump2.cfg
-----------------------------------------------
write-back: no
tidy-mark: no
quote-ampersand: no
show-warnings: yes
char-encoding: raw
markup: yes
show-acc-warnings: no
hide-endtags: no
uppercase-tags: no
uppercase-attributes: no
wrap-script-literals: no
numeric-entities: no
indent: auto
wrap: 0
logical-emphasis: no
clean: no
drop-font-tags: no
-------------------------------
3. Original HTML Tidy Execution
-------------------------------
../orig/tidy -e -config coredump2.cfg coredump2_O.htm
Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
line 7 column 4 - Warning: inserting implicit <dl>
line 7 column 8 - Warning: <img> lacks "alt" attribute
line 8 column 8 - Warning: <img> lacks "alt" attribute
line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
line 8 column 48 - Warning: trimming empty <dt>
line 9 column 11 - Warning: missing </center> before <h1>
line 9 column 11 - Warning: trimming empty <center>
Segmentation fault (core dumped)
------------------------------------
4. Examination of Core Dump with gdb
------------------------------------
gdb -nx ../orig/tidy -c core
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
Core was generated by `../orig/tidy -e -config coredump2.cfg coredump2_O.htm'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
205 if (parent->last == element)
(gdb) where
#0 0x804a6bd in InsertNodeAfterElement (element=0x8071a88, node=0x8071a88) at parser.c:205
#1 0x804cc13 in ParseDefList (lexer=0x806f2c0, list=0x8071a88, mode=0) at parser.c:1473
#2 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x8071640, mode=0) at parser.c:432
#3 0x804f4ef in ParseBody (lexer=0x806f2c0, body=0x80714c0, mode=0) at parser.c:2883
#4 0x804ac47 in ParseTag (lexer=0x806f2c0, node=0x80714c0, mode=0) at parser.c:432
#5 0x804fe81 in ParseHTML (lexer=0x806f2c0, html=0x8071390, mode=0) at parser.c:3217
#6 0x804ffb9 in ParseDocument (lexer=0x806f2c0) at parser.c:3264
#7 0x80604a4 in main (argc=2, argv=0xbffff9f0) at tidy.c:956
(gdb) l 205
200 Node *parent;
201
202 parent = element->parent;
203 node->parent = parent;
204
205 if (parent->last == element)
206 parent->last = node;
207 else
208 {
209 node->next = element->next;
(gdb) p element->parent
$1 = (struct _node *) 0x0
---------------------------
5. HTML Tidy Source Patches
---------------------------
*** ./orig/parser.c Fri Aug 4 12:21:05 2000
--- ./code/parser.c Thu Aug 10 09:27:27 2000
***************
*** 1466,1472 ****
--- 1466,1474 ----
}
/* and parse contents of center */
+ lexer->excludeBlocks = no;
ParseTag(lexer, node, mode);
+ lexer->excludeBlocks = yes;
/* now create a new dl element */
list = InferredTag(lexer, "dl");
------------------------------
6. Patched HTML Tidy Execution
------------------------------
../code/tidy -e -config coredump2.cfg coredump2_O.htm
Tidy (vers 4th August 2000) Parsing "coredump2_O.htm"
line 7 column 4 - Warning: <dt> isn't allowed in <body> elements
line 7 column 4 - Warning: inserting implicit <dl>
line 7 column 8 - Warning: <img> lacks "alt" attribute
line 8 column 8 - Warning: <img> lacks "alt" attribute
line 8 column 44 - Warning: <hr> isn't allowed in <dt> elements
line 8 column 48 - Warning: trimming empty <dt>
line 10 column 3 - Warning: trimming empty <dl>
line 11 column 4 - Warning: missing <li>
line 11 column 4 - Warning: inserting implicit <dl>
line 11 column 8 - Warning: <img> lacks "alt" attribute
line 12 column 8 - Warning: <img> lacks "alt" attribute
line 12 column 44 - Warning: <hr> isn't allowed in <dt> elements
line 12 column 48 - Warning: trimming empty <dt>
line 13 column 3 - Warning: missing </dl> before </ul>
coredump2_O.htm: Document content looks like HTML 3.2
14 warnings/errors were found!
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Core Dump Case 2</title>
</head>
<body>
<center>
<h1>Heading 1</h1>
</center>
<dl>
<dt><img src="redball.gif"><b>Term 1</b></dt>
<dt><img src="redball.gif"><b>Term 2</b></dt>
<dd>
<hr>
</dd>
</dl>
<center>
<h1>Heading 2</h1>
</center>
<div style="margin-left: 2em">
<dl>
<dt><img src="redball.gif"><b>Term 3</b></dt>
<dt><img src="redball.gif"><b>Term 4</b></dt>
<dd>
<hr>
</dd>
</dl>
</div>
</body>
</html>
The alt attribute should be used to give a short description
of an image; longer descriptions should be given with the
longdesc attribute which takes a URL linked to the description.
These measures are needed for people using non-graphical browsers.
For further advice on how to make your pages accessible
see "http://www.w3.org/WAI/GL". You may also want to try
"http://www.cast.org/bobby/" which is a free Web-based
service for checking URLs for accessibility.
HTML & CSS specifications are available from http://www.w3.org/
To learn more about Tidy see http://www.w3.org/People/Raggett/tidy/
Please send bug reports to Dave Raggett care of <html-tidy@w3.org>
Lobby your company to join W3C, see http://www.w3.org/Consortium
Received on Thursday, 10 August 2000 14:24:44 UTC