W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

Re: 30apr00 bugs: spaces in attributs, empty ALT attributes

From: Sebastian Lange <lange@cyperfection.de>
Date: Wed, 03 May 2000 14:37:48 +0200
Message-Id: <4.3.1.2.20000503130044.00b80e08@pop3.cyperfection.de>
To: micke@four04.com
Cc: html-tidy@w3.org
Hello Mikael, thanks for your reply.

At 21:03 02.05.2000 +0200, Mikael hultgren wrote:
>On Tue, 02 May 2000, Sebastian Lange wrote:
>
> > After compilation and testing of the new tidy release (SuSE Linux 6.2), I
> > have noticed following misbehaviours:
>
>The new release, would that be tidy30april?

yes. freshly compiled on a SuSE Linux 6.2 Distribution.


> >
> > The commonnly found
> >       <FONT FACE="Arial, Helvetica, sans-serif">
> > gets tidied to:
> >       <FONT FACE="Arial," SANS-SERIF="">
>
>I tried this and i didnt see this beahaviour, im using tidy30april on linux.

this is my syntax (tidy being called from within a perl script):

$tidiedMessage = `echo "$Message" | tidy30apr00 -f /dev/null -quiet -latin1 
--wrap 0 --wrap-attributes no --tidy-mark no --uppercase-tags yes 
--uppercase-attributes yes --alt-text '' --doctype '-//W3C//DTD HTML 4.0 
Transitional//EN' --quote-marks yes`;

I tried removing all the configuration directives, with no significant 
changes in the result.

with $Message = '<FONT FaCe="Comic Sans MS">test</A>', $tidiedMessage will be:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE></TITLE>
</HEAD>
<BODY>
<FONT FACE="Comic" SANS="" MS="">test</FONT>
</BODY>
</HTML>

The irrelevant lines (doctype to body, /body to /html) are then 
automatically removed by my perl script, which leaves $tidiedMessage to be 
'<FONT FACE="Comic" SANS="" MS="">test</FONT>'.
Having this tidied again, turns $tidiedMessage into '<FONT FACE="Comic" 
SANS="MS=">test</FONT>' which then stays like that on subsequent tidy attempts.


> >
> > Another example that I've seen quite often is
> >       <FONT FACE="Comic Sans MS">
> > which is tidied to:
> >       <FONT FACE="Comic" SANS="" MS="">
>
>if i write
><FONT FACE="Comic Sans MS">
>it gets tidied to
><FONT FACE="Comic Sans MS">
>
>Which is correct,
>
>but if i do
><FONT face=Comic Sans MS>This is a test</FONT
>it gets tidied to
><FONT face="Comic" sans="" ms="">This is a test</FONT>
>
>But then again one should be using "" in the face tag

Yes, but in my case, the correct version (with "'s) gets "tidied" to the 
worse... that's what's bothering me and what I'd like to fix somehow.

> > The other problem is the empty-ALT-attribute-on-IMG-tags issue:
> >       <IMG SRC="logo.gif">
> > still gets tidied to
> >       <IMG ALT="" SRC="logo.gif">
> > which is "tidied" to
> >       <IMG ALT="SRC=logo.gif">
>
>for me
><IMG src="logo.gif"
>get tidied to
><IMG src="logo.gif"
>
>and
>
><IMG alt="" src="logo.gif"
>gets tidied to
><IMG alt="" src="logo.gif"
>
>This is the content of my tidyrc file
>
>markup: yes
>wrap: 88
>indent: yes
>doctype: auto
>char-encoding: ascii
>uppercase-tags: yes
>uppercase-attributes: no
>new-inline-tags: tmpl_var
>new-empty-tags: tmpl_var
>new-pre-tags: tmpl_var
>quote-marks: no
>quote-nbsp: no
>quote-ampersand: no
>
>
>Sorry i cant be of more help, if you want i can send you the complete test
>files and you can try them out.

Well, I am not working on files, I am kind of "abusing" Tidy to work on 
strings that (may) contain HTML codes...
This way, I manage to have my chat page validate as HTML 4.0 Transitional, 
which makes it stand out among all other chat sites that allow the user to 
use HTML in their messages: 
http://validator.w3.org/check/referer?uri=http%3A%2F%2Fwww.sl-chat.de%2FSL-Chat

At least most of the times it validates, whenever it fails, I usually find 
a simple fix to eliminate the error (by parsing the messages before sending 
them to Tidy).

The only idea that I have now is to maybe try out using a config file 
instead of command line options, I'll try and look into that. Still, this 
misbehavior of Tidy needs to get fixed, and I am more than happy to 
contribute - if only I knew where to start.

Many thanks so far,

sebastian.


> >
> >
> > Could someone kindly direct me to where to look for the cause of these
> > problems? I would very much like to contribute to Tidy, but I am kind of
> > confused about where to get started.
> > I am no C guru, but so far I understood all the discussed patches, and I
> > believe I would be able to fix this problem, if someone set me to the 
> right
> > path.
> > Many thanks.
> >
> >
> > Sebastian
> >
> > PS: I am using Tidy to clean up HTML-Input in my chat
> > (http://www.sl-chat.de/SL-Chat), does anybody have something like a
> > "powered by Tidy" button or such?
> >
>
>micke
>--
>When i die, I want to die peacefully, in my sleep. Like my grandfather.
>Not screaming in terror, like the passengers in his car
>
>pub  1024D/D9529B7A 1999-12-14 Mikael hultgren <micke@four04.com>
>      Key fingerprint = A2E6 6A3F 3A48 6EDF 0689  C3DE 4B9A 7EC8 D952 9B7A
>To get public key send message with subject line 'get pgp keys'
Received on Wednesday, 3 May 2000 08:36:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT