W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2007

Re: More Questions

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 03 Sep 2007 19:17:29 +0200
To: <Charles.VILLEPREUX@oecd.org>
Cc: <html-tidy@w3.org>, <Pascale.CISSOKHO-MUTTER@oecd.org>, <Marion.DESMARTIN@oecd.org>, <Joseph.CROWTHER@oecd.org>
Message-ID: <fefod3ldhfl05tbfi0m3dmajcf6ab6lkt1@hive.bjoern.hoehrmann.de>

* <Charles.VILLEPREUX@oecd.org> wrote:
>Using "HTML Tidy", I would like to have explanation concerning:
>
>1°) Why "HTML Tidy" does not manage properly the <sub> element within the
><title> element ?
>
>HTML: 
><title>SourceOECD: Factbook 2007 - Emissions of carbon dioxide
>(CO<sub>2</sub>)</title>
>
>XHTML:
>
><title>SourceOECD: Factbook 2007 - Emissions of carbon dioxide
>(CO</title>
></head>
><body>
><sub>2</sub>)  

You cannot use elements in the <title> element, you have to use plain
text. For cases like <sub>2</sub> Unicode has characters that can be
used (subject to limitations of the browser, window manager, fonts);
in this case, you might be able to use

  <title>SourceOECD: Factbook 2007 - Emissions
         of carbon dioxide (CO&#x2082;)</title>

Where &#x2082; refers to the character U+2082 SUBSCRIPT TWO.

>2°)  Why "HTML Tidy" does not convert &bull; or &#149 character ? 
>
>My configuration for the encoding process: Input-encoding:ascii ;
>Output-encoding:utf8
>When I open the XHTML page, the bullet does not appear.

The character reference &#149; should not be used in HTML documents,
it relies on a Microsoft Windows-specific mapping; &bull; would be
correct, if it does not show up properly, you might have a problem
with the character encoding declaration or fonts. Where do you open
the document, and what is shown in its place?

>> 1°) required attribute "alt" not specified for the "img" element
>> 	Q1: Why "HTML Tidy" do not create automatically required attributes ?

The contents of the alt attribute should replace the image if the
user did not download the image, or cannot access it for other
reasons (e.g., because they are blind). Tidy cannot know what an
appropriate alternate text would be, and so does not add it. There
is an alt-text option to fill in a default, but its use is not re-
commended.

>> 2°) required attribute "action" not specified for the "form" element
>> 	
>> 			HTML:
>> 				<form name=switchit><input type=hidden
>> value=0 name=switchedselector></form>
>> 			
>> 			XHTML:
>> 				<form name="switchit" id="switchit"><input
>> type="hidden" value="0" name="switchedselector" /></form>
>> 	
>> 	Q2: Why does HTML Tidy create the attribute id="switchit" ?

Because in XHTML the name attribute on the form element might not
be recognized as identifying the form. The XHTML specification re-
commends to use both the name and id attribute for this reason.

>> 3°) ID X already defined 
>> 
>> 			HTML:
>> 				<br />
>> 				<br><br></p><p></p>
>> 
>> 				</span></td>
>> 				</tr>
>> 
>> 			XHTML:
>> 				<br />
>> 				<br />
>> 				<br />
>> 				<br />
>> 				<br /></span>
>> 				<p><span id="06" style="display:
>> none;"></span></p>
>> 				</td>
>> 				</tr>
>> 
>> 
>> 	Q3: Why does HTML Tidy create a span element with an id (06) which
>> already exists ?

This would depend on the outer structure of the document; generally it
would do so in order to fix some other error in the document.

>> 4) value of attribute "id" invalid: "0" cannot start a name (For example,
>> id and name attributes must begin with a letter, not a digit)
>> 
>> 			HTML
>> 				<img src="plusminusimages/01plus.gif"
>> border="0" id="01_image" onclick="javascript:changePlusMinus('01');"
>> style="cursor:pointer;cursor:hand" name="01_image" />
>> 				...
>> 				<span id="01" style="display: none ...
>> 
>> 	Q4: Do you think it is more a warning than an error ? (It does not
>> seem to provoke any problem when browsing the XHTML file ...)

Browsers might tolerate the error, but the HTML and XHTML specifications
prohibit the use of IDs that start with a number. Generally, the work-
around is to use a short prefix like id="x01_image".

>> 5°) there is no attribute "width" for the "div" element
>> 
>> 
>> 			HTML:
>> 				<div id="showhideText" width="100"
>> class="normal">Show all indicators</div>
>> 			XHTML:
>> 				<div id="showhideText" width="100"
>> class="normal">Show all indicators</div>
>> 
>> 
>> 	Q5: Why does "HTML Tidy" keep the "width" attribute ? Why not delete
>> it ?

Tidy does not know how to handle it; in some cases it might make sense
to remove it, in other cases it might be best to keep it; you can re-
move such attributes using --drop-proprietary-attributes, if I remember
the name correctly; see the manual on http://tidy.sf.net/ for details.

>> Afterwards I have checked with Internet Explorer browser the difference
>> between the HTML and the XHTML  
>> I do not understand these following transformations done by "HTML Tidy":
>> 
>> I) - A link does not work anymore.
>> 	
>> 			HTML:
>> 				<a
>> href="javascript:openAll('01','02','03','04','05','06','07','08','09','10',
>> '11','12')" class="Text"><div id="showhideText" width="100"
>> class="normal">Show all indicators</div></a>
>> 			
>> 			XHTML:
>> 				<a
>> href="javascript:openAll('01','02','03','04','05','06','07','08','09','10',
>> '11','12')" class="Text"></a>
>> 				<div id="showhideText" width="100"
>> class="normal">Show all indicators</div>
>> 
>> 	Q6: Why does "HTML Tidy" change the imbrication of tags ?

The HTML and XHTML specifications do not allow nesting <div> inside the
<a> element. You have to use some other element (e.g. <span>) and use
CSS if block formatting is required (e.g., <span style='display:block'>
in place of the <div ...>).

>> II) - More line breaks.
>> 
>> 			HTML:
>> 				&bull; <a class='Text'
>> href='01-02-02.htm'>Elderly population by region</a><br />
>> 				<br><br></p>
>> 
>> 				</span></td>
>> 				</tr>
>> 
>> 			XHTML:
>> 				&#8226; <a class='Text'
>> href='01-02-02.htm'>Elderly population by
>> 				region</a><br />
>> 				<br />
>> 				<br />
>> 				<br />
>> 				<br /></span></td>
>> 				</tr>
>> 
>> 	Q7: Why does "HTML Tidy" generate additional <br /> elements ?

My guess here is that there is no opening <p> that matches the closing
</p>, so Tidy turns the latter into <p></p> and then replaces it with
two <br> elements.

I hope this help,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Monday, 3 September 2007 17:17:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT