RE: Tidy becomes less forgiving

Hi Dave,

I think we'll just have to agree to disagree and work together to build a
consensus.  

De facto standards are, by definition, based on implementations -
independent of specifications, if any.  Specifications live and die by their
implementations.  The best specs *follow* implementation rather than the
other way around.  For example, dictionaries almost never introduce new
words.  They merely document existing usage, which 99% of the time springs
up informally and is only occasionally dictated by standards bodies (varies
by language).  Have you seen an ISO stack in the wild lately?

I respect and mostly agree with your point of view.  As I keep on saying,
Tidy must straddle the fence between standards compliance and available
implementations.

take it easy,
Charlie



-----Original Message-----
From: J. David Bryan [mailto:jdbryan@acm.org]
Sent: Monday, September 17, 2001 7:21 PM
To: HTML Tidy List
Subject: RE: Tidy becomes less forgiving


On 17 Sep 2001, at 16:54, Reitzel, Charlie wrote:

> But consider those cases where most browsers support a 
> feature not included in any HTML spec (e.g. cols attribute 
> of <table>).  Tidy knows all about this one and doesn't emit 
> a peep. I.e. Tidy respects the de facto standard implemented 
> by major browsers.

Your observation is correct, but your conclusion is suspect, I'm afraid.

Tidy doesn't check to ensure that otherwise valid attributes are placed on
the correct tags.  So Tidy accepts the "cols" attribute (valid for use on
the "textarea" tag) on the "table" tag as you say.

However, Tidy also doesn't complain if the "table" tag contains the 
attributes "type=disc" or "charset=iso-8859-1" or "rel=stylesheet".  To
which de facto standard do these belong?  ;-)


> Nor am I so quick to label as "invalid" large bodies of working code.

If the markup doesn't adhere to the syntactic and semantic requirements of
the HTML specification, then it is invalid HTML by definition.  Whether
such invalid HTML behaves as one wants is an entirely separate issue.

A misspelled English word is misspelled regardless of whether it is 
understood by the reader or whether it conveys the appropriate meaning
(i.e., whether or not it "works").


> If everybody took the rigid view, the web simply wouldn't exist.

Personally, I believe that if everybody had taken the rigid view, we would 
have a much more interoperable and accessible Web.

But this is all really beside the point.  If you believe that Tidy should 
morph into a program that generates syntactically invalid HTML, that's 
fine.  The result won't be HTML Tidy, though, because the function of the 
program will have changed significantly.  Whether that changed function is 
or is not desirable is open to debate, but if so, then I believe that it 
should be implemented separately from a program that has, as its 
fundamental goal, a transformation to valid HTML.

I use Tidy to turn junk markup into valid HTML.  I suspect others might be 
using it for similar purposes.  If the output of Tidy doesn't validate and 
doesn't warn about the use of proprietary constructs, then I would consider 
that to be a Tidy bug (as the mix-and-match attribute problem is a bug).

If it is deemed desirable to have a program that generates invalid HTML, 
then a new program could be created from the existing Tidy code.  I 
suppose, as an alternate to creating a separate program, that adding an 
option (perhaps "--broken-html" :-) to Tidy would be acceptable, although 
at the expense of additional complexity.  But changing Tidy to generate 
invalid HTML, "de facto standards" notwithstanding, will render Tidy 
useless for a segment of the user population.

                                      -- Dave

Received on Tuesday, 18 September 2001 13:28:30 UTC