W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000


From: Jelks Cabaniss <jelks@jelks.nu>
Date: Mon, 14 Feb 2000 12:56:45 -0500
To: <html-tidy@w3.org>
Matthew Brealey has had problems with his mail configuration, and for some
reason his post below -- intended for the Tidy list -- ended up on www-style.
So I am posting this here on his behalf:

-----Original Message-----
From: www-style-request@w3.org [mailto:www-style-request@w3.org]On
Behalf Of Matthew Brealey
Sent: Friday, February 11, 2000 5:39 AM
To: www-style
Subject: Bugs/suggestions

Tidy destroys the following example when clean: yes:
<div class=example>
This has a class that will give this element have a border; older browsers
will at least get the indentation.
<div class="c1">
This has a class that will give this element have a border; older browsers
will at least get the indentation.

When clean: no, this is the (better) result:
<div style="margin-left: 2em">
<div class="example">
This has a class that will give this element have a border; older browsers
will at least get the indentation.

However, even this is unsatisfactory, because the author that tidies their
page in Tidy will be surprised to find their page destroyed in Internet
Explorer 3.0 (though not 3.01, 3.02 or 3.03), as used by approximately 3%
of the WWW. For this browser treats ems as pixels (as indeed do all 3.0?
versions), which is not of itself catastrophic because it will just not
get indentation, but what is a real problem is that it interprets all
margins as relative to the left of the canvas. Thus:

Default margin
-------------|Normal Body will be rendered here
--|Two pixels in - where tidied OL and UL elements will be placed

This problem is made even worse when the author has specified a
larger-than-normal margin on BODY because the effect is made that much
more dramatic. The worst of it is that those who deliberately shun CSS
(and with very good reason - who wouldn't sacrifice the advantages of
maintainability at the altar of pages that actually work - if your pages
crash, are made unreadable or don't work due to the abominable support for
CSS in browsers, you lose business; if you don't take advantage of the
secondary advantages that CSS provides of maintainability, you only lose a
few minutes of your time) and who abuse the OL and UL elements will be the
most affected because they are less likely to have valid <div
style="margin-left: whatever"> (or a class) for indentation.

I would like to see an option to leave the (invalid) OL and UL elements as
they are.

Another thing that need fixing is that Tidy doesn't check that STYLE or
LINK rel="stylesheet" declares the type attribute.

It is also arguable that it should also give a warning if META
http-equiv="Content-Style-Type" content="text/css" is missing when the
style attribute is used; however, one author in a million might actually
have it as a true HTTP header, so for them such a warning would be

Furthermore, something that really annoys me is that when I accidentally
omit a closing tag, Tidy thinks that I didn't know that the element
couldn't span block elements and therefore inserts literally hundreds of
(for example) <b>s and </b>s.

Thus, where I have
Some <b>bold text -  accidentally missed off the '</b>'
it thinks that I wanted the boldness to span the whole of the rest of the
document, and as a result I have to manually remove the hundreds of tags
it adds (on one particularly extreme occasion it converted one of my (very
complex and difficult to recreate) test pages into a whole load of <PRE>
elements, and because of restrictions on the elements that can occur
within <PRE> applied my style declarations on <BR> thereby destroying
several hours of work).

I believe this 'feature' will occur with every block and inline element
except for P, DIV, OL, UL, LI, DL, DD and DT.

I do wish Tidy wouldn't think that its users are idiots and don't know
that inline elements can't span block ones and that pre can't enclose
other block elements, and would simply just add the missing </b>, but not
embolden the rest of the document, or trigger an error and terminate (or
even an option idiot - boolean - idiot: yes treats the user like an idiot,
idiot: no simply adds the closing tag in its correct location).

As a matter of principle I abhor PRE, since it frequently results in nasty
scroll bars, which really annoys viewers of pages. However, it can be
tiresome when I have a code listing to have to put a <br> after every line
and put in non-breaking spaces for indentation.

As a result, what I would like is for Tidy to do the task that bores me
so: the conversion of PRE into a P element with <br> and &nbsp; to replace
the spaces and carriage returns in the PRE. The option to do this should,
IMO, be set via a convert-pre option in Tidy's config file. This should
take the values yes, no or <identifier>, where <identifier> is a HTML
class, as in <p class="identifier">. This option is important, since  I
like my code to have less leading than my P elements, and so therefore the
class is useful for doing things like P.code {line-height: 1.2} to reduce
my normal P {line-height: 1.5}. If the convert-pre option were simply to
be made a boolean variable, then yes should result in <p class="pre">.

Further to this on the subject of <pre>, Tidy destroys the CSS
white-space: pre declaration. This is because it does not recognise that
<P style="white-space: pre">
the white space is meaningful.

To cope with this, I would suggest that if an ID or class begins with
'pre', then it should be treated as preformatted; e.g., <P id="precode">
should be treated as preformatted (and no, one shouldn't use the PRE
attribute for preformatted text, because it gives no indication as to the
content of the element - BLOCKCODE and BLOCKSAMP would be good, PRE =

Action is a required attribute for FORM; Tidy does not point this out.

Tidy breaks the STYLE element - it converts non-ASCII (or non-whatever
encoding the document is expressed as being in) characters such as  to
character references, but they aren't valid there. Correct would be to
express them as a (hexadecimal) CSS escape; e.g., \E1 in this case.
However, since no browser (except the nascent Mozilla 5) supports them,
there is rarely a need to convert them since the CSS nmchar macro allows
{nonascii} where {nonascii} is [\200-\377] (Octal codes; the 377 should be
read as 4177777 (Octal) (because of Flex limitations)).

Similarly ID - it converts characters like  to character references, but
as ID is based on the NAME token, character references are not valid

It would be useful for Tidy to check the validity of class and ID tokens;
for example, many people use invalid classes such as
class="1invalididentifier" or "-invalid" - these will (correctly) be
ignored by many browsers. Similarly for ID - check that it is a valid SGML
NAME token.

Why doesn't clean: yes remove BASEFONT?

Tidy's conversion of FONT size="n" is inappropriate.

For example, <font size=1> is mapped to font-size: 70%. This is
inappropriate because font-size: 70% depends on the font-size of the
ancestor, which Tidy does not know, whereas <font size=1> is the same
wherever it is used in a document. Similarly <font size="+n"> - this is
relative to the current <font size>, and not to the CSS font-size; for
example, <div style="font-size: 100000px"><font size="+2"> means 2 more
than the current <font size> - the font-size declaration has no effect.

Perhaps most seriously, some <p><font size></p>s are mapped to H2, H3, H1.
This is inappropriate because it is not clean layout - it will wreck any
nice heading order (i.e., going from h1 to h2 to h3, etc. according to the
structure of the document) that the author has. Furthermore, H1 has far
more margin than P. In addition, headings vary radically in font size and
formatting; for example, in my style sheets I have H1 {margin-left: -4%;
margin-top: morethanp; line-height: lessthanp; margin-bottom: lessthanp;
font-size: radicallydifferentfromnormal}, so to get the P element mapped
to the probably/possibly structurally inappropriate H? element is

There is a danger that the classes chosen will be redefined. For example:
<font face="Tahoma">
This is text
If that were to be tidied, the result would be:
<p class="c1">
This is text
.c1 {font-family: Tahoma}

If on a later date the file were to be re-tidied with this:
<font face="Verdana">
This is text
, the result would be:
<p class="c1">
This is text
.c1 {font-family: Verdana}

, which would redefine the first example - Tidy does not check to see
whether the classes it uses have been used have been used before - it
starts from c1 each time it tidies.


If hide-endtags: yes and clean: yes, Tidy drops highly meaningful BODY
tags; for example:
<body bgcolor=red>
or <body style="background: red">
with clean: yes is mapped to:
<style type="text/css">
 body.c1 {background: red}
, but without any BODY for the class to refer to!

In this case this is appropriate:
<style type="text/css">
 body {background: red}
because the BODY is unique to the document.

Why doesn't clean clean TABLE backgrounds?

Tidy is destroying my pages - it is adding type="text/javascript" to my
SCRIPT language="jscript1.2" element. Although type is required, if you
add it Internet Explorer will ignore the (yes I know deprecated) language
attribute, which causes serious problems - there should be an option to stop
it doing this.

From Matthew Brealey (http://members.tripod.co.uk/lawnet (for law)or
http://members.tripod.co.uk/lawnet/WEBFRAME.HTM (for CSS))
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
Received on Monday, 14 February 2000 12:59:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:47 UTC