W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2001

Re: Bug + fix for illegal ampersands and character entities

From: Randy Waki <rwaki@flipdog.com>
Date: Tue, 20 Feb 2001 11:30:47 -0500
Message-ID: <002201c09b5a$7cf66ba0$c89356cf@whizbang.com>
To: "Bertilo Wennergren" <bertilow@chello.se>
Cc: <html-tidy@w3.org>
Bertilo Wennergren wrote:
> Randy Waki:
> > 4-Aug-2000 Tidy's handling of illegal ampersands such as "id=1&lang=en"
> > is inconsistent with browsers.
> Which browsers? Please demonstrate code that uses the correct "&amp;"
> and that breaks in a named browser.
> If you can't do that, then just correct the code, as Tidy wants
> you too, and be done with it.

I think you may have misunderstood my intent.  Tidy issues an error
message and generates correct HTML both before and after my proposed fix.
It just that after, the correct HTML is more likely to reflect what the
author originally intended.

Tidy is used for many purposes.  A major one is to tell people how to
clean up their HTML.  For this case, you're absolutely right -- Tidy is
doing its job and the HTML should just be fixed.

But another major use for Tidy is to automatically clean up HTML that
cannot be fixed at the source.  Examples include the horrid HTML that is
generated by some HTML authoring software, HTML obtained from third
parties, pre-existing HTML (often in huge quantities), etc.  This is the
case that I was attempting to address.

As Alex points out, the unfortunate reality is that most people assume
that if both (or even just one) of the major browsers displays their HTML
like they want, their HTML must be OK.  So, all else being equal, when
Tidy is automatically cleaning up "bad" HTML, its best bet is to generate
"good" HTML that mimics how IE and/or Netscape interpret the "bad" HTML,
especially when IE and Netscape interpret the "bad" in the same way.

Received on Tuesday, 20 February 2001 11:32:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:49 UTC