eleven serious thoughts about tidy8jul00

Here are, in no specific order, eleven thoughts about the current release 
of tidy and possible improvements for future releases. This list is 
certainly not complete, please feel free to add your comments to it!

1) <TEXTAREA> should be recognized by "indent: auto", otherwise you'll get 
a bunch of spaces in the tidied textarea

2) having multiple <BASE> tags in a document gets them ALL into the header, 
instead of just one and discarding the rest (which one should get chosen? 
the first? the last? any other priorities?)

3) XHTML Strict requires <input> to be a child of either "p", "h1", "h2", 
"h3", "h4", "h5", "h6", "div", "pre", "address", "fieldset", "ins", "del"

4) <form> is required to have an "action" attribute, but tidy doesn't even 
warn about the lack thereof

5) Warnings about missing elements are omitted for: <!DOCTYPE>, <HTML>, 
<HEAD>, <BODY>... is there a reason for this behavior?

6) When <TITLE> is missing, Tidy inserts a blank <TITLE></TITLE> pair. Is 
this desirable? Or should it maybe rather insert something more or less 
meaningful à la <TITLE>Document edited by Tidy</TITLE>?

7) When "uppercase-tags: yes", doctype should be <!DOCTYPE HTML PUBLIC>, 
otherwise it should stay <!DOCTYPE html PUBLIC> (what matters is whether 
it's written <HTML> or <html> in the document... after all an authors' 
preference only). Or is there a reason for Tidy to always put "html" in 
lowercase?

8) Like for <IMG ALT>, it should be possible to set a configuration 
directive for <TABLE SUMMARY>... maybe with a similar accessibility 
warning, but I would like to be able to automatically process files without 
being bothered by warnings... and then, in a second run, search for the 
given string and replace it with something more meaningful.

9) line 113 column 1 - Warning: html doctype doesn't match content
Now this is a seriously interesting message, but it doesn't help me much. 
(Specified doctype was HTML 4.01 Transitional.)
Would it be possible to extend this warning in a way that Tidy gives 
information about WHY the doctype does not match?
If you want to reproduce this, try tidying http://www.roche.de/ with 
"doctype: loose" set, and then RE-RUN the output through tidy again.

10) when '-f "errors.txt" or "--error-file errors.txt", errors should 
indeed be written to errors.txt, and not to STDERR

11) when '--doctype "-//IETF/DTD HTML 2.0//EN", this doctype should indeed 
be used, and not '--doctype "auto"' be implied, as it currently happens


I think that's it for now, please feel free to add your observations.

One word to Dave:
Dave, I have great respect for you. Tidy is probably the most exciting tool 
that I have found regarding HTML. If you see a better way to contribute to 
its development than discussing thoughts like the aforementioned on this 
list, please tell. Maybe if someone with a thorough understanding of tidy's 
source code could post a little introduction to it?


--
Sebastian Lange
http://www.sl-chat.de/
Maybe the first chat site that validates as HTML
4.0 even though user input may contain HTML codes.

Courtesy to Dave Raggett's HTML Tidy:
http://www.w3.org/People/Raggett/tidy/

Received on Thursday, 13 July 2000 12:25:25 UTC