W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

RE: TIDY doesn't handle asp scripts correctly...instead it misinterprets them

From: Randy Waki <rwaki@flipdog.com>
Date: Mon, 21 Aug 2000 20:31:08 -0600
To: "Andreas Eibach" <a.eibach@gmx.net>
Cc: <html-tidy@w3.org>
Message-ID: <000601c00be1$071d4600$51eee13f@rwaki>
Andreas Eibach wrote:
> 
> > Andreas Eibach wrote:
> > >
> > > Anyway, I've encountered a bug in current version:
> > >
> > > Just 'TIDY'  my small html file I enclosed in my mail - this explains it
> > > all.
> >
> > As it turns out, this came up a few weeks ago.  Tidy is actually doing
> > the right thing.  The &'s must be escaped as &amp; according to the HTML
> > spec.  Check the thread in the mailing list archives at
> 
> As it turns out, this was *not* what I meant.
> 
> I think it hasn't anything to do with my problem.
> 
> Mine is that a site URL inside the HTML
>  <a href="http://www.bogus.com/script.asp?p1=1&p2=2&p3=3>
> 
> results in a warning message
> 
> line 1 column xx - Warning: unescaped or unknown entity "&p2"
> line 2 column yy - Warning: unescaped or unknown entity "&p3"
> 
> The thing the thread far below is talking about is a different thing.

Sorry for not being clearer.

Strictly speaking (which we do when it comes to Tidy's output :)), if
you want an ampersand character in an attribute such as href, you are
supposed to write it as &amp; and not just &.  Writing ampersands this
way is known as "escaping" the ampersand, hence Tidy's warning about an
unescaped &.

(FYI, your Tidy warnings above have a critical typo.  They should say:

   line 1 column xx - Warning: unescaped & or unknown entity "&p2"

Tidy is saying that either a) you have an unescaped ampersand, which is
true in your case, or b) you have an unknown entity named p2, which is
not true in your case.  Tidy doesn't know which possibility is true, so
it lists them both.)

Browsers, on the other hand, are lax.  They let you write either &amp;
or & to get an ampersand character.  However, without getting into a lot
of detail, there are some potentially confusing cases where a plain &
doesn't work as expected, so it's just as well that HTML has outlawed
it.

So Tidy is simply converting each unescaped & into &amp;

Before Tidy (illegal href attribute):

   <a href="http://www.bogus.com/script.asp?p1=1&p2=2&p3=3">

After Tidy (legal href attribute):

   <a href="http://www.bogus.com/script.asp?p1=1&amp;p2=2&amp;p3=3">

Browsers will accept either href.  Try it (but be sure to try it in an
acutal HTML document; it might not work if you just type/paste it into
your browser's Address/Location field).

--Randy
Received on Monday, 21 August 2000 22:32:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT