30apr00 nested <CODE>

I just downloaded and compiled the 30apr00 release of HTML Tidy.
I tried it on a file which nsgmls (from sp-1.3) claims is valid
HTML 3.2 (according to the HTML32.dtd file I downloaded from the
W3C this year) except for one unrecognised attribute elsewhere.
When it got to this line:
   <CODE>error(<CODE>limit_exceeded(max_errors, Max)</CODE>, _)</CODE>
HTML Tidy said
   line 310 column 14 - Warning: <code> is probably intended as </code>
   line 310 column 50 - Warning: discarding unexpected </code>
   line 310 column 61 - Warning: discarding unexpected </code>

<!ENTITY % phrase " ... | CODE | ...">
<!ENTITY % text "... | %phrase; | ...">
<!ELEMENT (%font|%phrase) - - (%text)*>
The fragment is rather odd, but it *is* perfectly legal.
This sees to be a bug in HTML Tidy.  The version of HTML Tidy I was
using before got this right; I _think_ it was late 1998.

HTML Tidy makes the same mistake with other examples of
<CODE>a<CODE>b</CODE>c</CODE>.  In fact, it makes the same
mistake with _all_ the %font and %phrase elements that I have
tested.  I enclose a test file at the end of this message.
I've tried it with Amaya and Netscape, and more importantly,
I've tried it with Nsgmls.  They all love it.  Only htmltidy hates it.

It is an excellent thing for HTML Tidy to produce WARNINGS for
instances of a font or phrase tag nested inside another occurrence
of the same tag.  However, it is a BAD thing for HTML Tidy to change
legal HTML without an explicit command to do so.  I do not have any
configuration files for HTML Tidy, so no command to change legal HTML
could have been found in one.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 3.2//EN">
<!-- File   : foo.html
     Author : Richard A. O'Keefe
     Purpose: Test HTML Tidy.  This is all LEGAL and should not be changed.
-->
<HTML>
<HEAD>
<TITLE>Foo!</TITLE>
</HEAD>
<BODY>
<P>EM: <EM> a <EM> b </EM> c </EM>
<P>STRONG: <STRONG> a <STRONG> b </STRONG> c </STRONG>
<P>I: <I> a <I> b </I> c </I>
<P>B: <B> a <B> b </B> c </B>
<P>B: <B> a <B> b </B> c </B>
<P>TT: <TT> a <TT> b </TT> c </TT>
<P>BIG: <BIG> a <BIG> b </BIG> c </BIG>
<P>SMALL: <SMALL> a <SMALL> b </SMALL> c </SMALL>
<P>CODE: <CODE> a <CODE> b </CODE> c </CODE>
<P>SAMP: <SAMP> a <SAMP> b </SAMP> c </SAMP>
<P>KBD: <KBD> a <KBD> b </KBD> c </KBD>
<P>U: <U> a <U> b </U> c </U>
<P>STRIKE: <STRIKE> a <STRIKE> b </STRIKE> c </STRIKE>
<P>DFN: <DFN> a <DFN> b </DFN> c </DFN>
<P>CITE: <CITE> a <CITE> b </CITE> c </CITE>
<P>VAR: <VAR> a <VAR> b </VAR> c </VAR>
</BODY>
</HTML>

Received on Sunday, 25 June 2000 22:12:15 UTC