- From: Sebastian Lange <lange@cyperfection.de>
- Date: Wed, 10 May 2000 10:56:28 +0200
- To: HTML Tidy List <html-tidy@w3.org>
Following features I miss in tidy30apr00 and would very much like to see them added into a future version: - Suppression of unknown attributes (while the next one is easy to accomplish with perl, I don't know what attributes are unknown, thus can't delete them) - Suppression of empty font tags (<FONT SIZE=5></FONT> does nothing in HTML, why not delete it) - a smarter guessing routine for unquoted attributes a common misspelling that I often see is like this: <FONT FACE=Arial, Helvetica COLOR=navy blue> Tidy "cleans" it to <FONT FACE="Arial," HELVETICA="" COLOR="navy" BLUE=""> while I rather correct this to <FONT FACE="Arial, Helvetica" COLOR="navy blue"> (even though this would look brown then, I don't care, that's the writer's fault, but at least I am creating valid HTML 4.0 Transitional like this) - maybe a list to correct common misspellings... brits and aussies for example like to write <FONT COLOUR="blue">, which produces a validation error. changing this to <FONT COLOR="blue"> is a safe guess. I am going to compile such a list over the next weeks and would happily pass it on, if requested. My Tidy wrapper in Perl is going to be finished by the end of this week, I will publish it then. Many thanks to Pete Gelbman, Mikael Hultgren and Mike Depot for their helpful support. for those that are interested, here is my loop to quote unquoted attributes (color and face only for now, as I haven't seen need for others yet). Any kind of feedback, suggestions, requests or some teaching about how to do this in a better way are very welcome! (I had to add the comments, otherwise I wouldn't understand what I hacked up myself.) while ($FormData{$key} =~ m/<[^>]*(color|face)\s*=\s*[^"][^>]*>/i) { # add quotation marks to unquoted COLOR & FACE attributes $FormData{$key} =~ s/ (<[^>]*?) # start of tag, e.g. '<FONT ' $1 (color|face) # the attribute name that we want quoted $2 \s*=\s* # the equal sign, maybe with spaces around ([^>|"|=]+?) # the unquoted attribute value $3 (?= # look ahead (\s+[a-z]+\s*=\s*) # next attribute (one word before a '=' sign) | > # end of tag ) # finish looking /$1$2="$3"/gimx; } -- Sebastian Lange http://www.sl-chat.de/ Maybe the first chat site that validates as HTML 4.0 even though user input may contain HTML codes. Courtesy to Dave Raggett's HTML Tidy: http://www.w3.org/People/Raggett/tidy/
Received on Wednesday, 10 May 2000 04:54:22 UTC