W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

(unknown charset) Re: Few Question about HTML Tidy

From: (unknown charset) Dave Raggett <dsr@w3.org>
Date: Mon, 17 Dec 2001 09:08:41 +0000 (GMT Standard Time)
To: (unknown charset) Hiroyuki Okamoto - 岡本 浩之 <hokamoto@mse.mei.co.jp>
cc: (unknown charset) html-tidy <html-tidy@w3.org>
Message-ID: <Pine.WNT.4.10.10112170857010.-771885@hazel.openwave.com>
On Sun, 16 Dec 2001, Hiroyuki Okamoto - [ISO-2022-JP] $B2,K\!!9@G7(B wrote:

> Hello, my name is Hiroyuki Okamoto.
> 
> I have a few questions about HTML Tidy. They are illegal cases,
> however content becomes strange after Tidying, so I report them.

Thanks for the detailed feedback.
 
> (1) When I write the following content,
> 
> <OBJECT>
>   <FONT>
>     TEST
>   </FONT>
> </OBJECT>
> 
> HTML Tidy moves the OBJECT element into the HEAD element like below.
> 
> <HTML>
>   <HEAD>
>     <OBJECT>
>       <FONT>
>         TEST
>       </FONT>
>     </OBJECT>
>     <TITLE></TITLE>
>   </HEAD>
> </HTML>

This is because object is allowed in the document head and hence
Tidy doesn't know that an implicit <body> start tag is required.

 
>   What should I do ?

Place a <body> start tag before the <object>

> (2) When I write the following content,
> 
> <BODY>
>   <TITLE>
>     TEST2
>   </TITLE>
> </BODY>
> 
>    HTML Tidy makes a new TITLE element like below.

This is due to the way Tidy checks for a missing title before
processing the body. In effect, Tidy generates errors during
parsing, and then applies a few more checks afterwards.
A work around would have been to log the error reports found
during parsing and then merge them in line number order with
those found in the subsequent checks. This would be major
refinement to how Tidy works today.
 
> <HTML>
>   <HEAD>
>     <TITLE></TITLE>
>     <TITLE>
>       TEST2
>     </TITLE>
>   </HEAD>
>   <BODY></BODY>
> </HTML>
> 
> Title element should be only one in HEAD element. What should I
> do ?

I am hoping that the next version of Tidy will fix this problem by
checking for an empty title element when moving the title from
the body to the head. Maintenance of Tidy has moved to Source Forge
http://tidy.sourceforge.net/ and a bunch of volunteers have taken
over work on dealing with bug reports and enhancement requests.

> (3) When I write the following content,
> 
> <FONT>
>   <SELECT>
>      TEST3
>   </SELECT>
> </FONT>
> 
> HTML Tidy deletes the text element in the SELECT element like
> below.
> 
> <HTML>
>   <HEAD>
>     <TITLE></TITLE>
>   </HEAD>
>   <BODY>
>     <FONT>
>       <SELECT></SELECT>
>     </FONT>
>   </BODY>
> </HTML>
> 
> I think deleting the text element directly inside the SELECT
> element is correct. However, if no element is inside the SELECT
> element, I think HTML Tidy should delete the SELECT element,
> too. Am I wrong ?

I think Tidy could do better, and that further investigation is
justified.

> (4) When I write the following content,
> 
> <BODY>
>   <BR>
>   </BR>
> </BODY>
> 
>    HTML Tidy makes 2 BR elements like below.
> 
> <HTML>
>   <HEAD><TITLE></TITLE></HEAD>
>   <BODY>
>     <BR/>
>     <BR/>
>   </BODY>
> </HTML>
> 
>    I think HTML Tidy should delete </BR> end tag. 

Yes, that looks like a bug.

> And When will HTML Tidy's new version be released ?

This is in the hands of the folks working on the Source Forge
code base. You can get a prerelease version from the website
and I am hoping that the official release will come shortly.
See http://tidy.sourceforge,net/

I have agreed to change the W3C Tidy page to match when they do.

Regards,

-- Dave Raggett <dsr@w3.org> or <dave.raggett@openwave.com>
W3C Visiting Fellow, see http://www.w3.org/People/Raggett 
tel/fax: +44 122 586-6240 (or 7351) +44 771 213 7629 (mobile)
Received on Monday, 17 December 2001 04:11:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:47 GMT