W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: Unexpected behaviour

From: Sebastian Lange <lange@cyperfection.de>
Date: Thu, 17 May 2001 20:40:00 +0200
Message-Id: <4.3.2.7.2.20010517202616.01757f10@mail.cyperfection.de>
To: html-tidy@w3.org
http://www.htmlhelp.org/tools/validator/direct.html

... fed with following two lines in the BODY:
<a href="search.pl?key1=value1&key2=value2">original version</a>
<a href="search.pl?key1=value1&amp;key2=value2">Tidy's version</a>

gives following error:

>Line 10, character 32:
>...  href="search.pl?key1=value1&key2=value2">original version</ ...
>                                  ^
>Error: unknown entity key2

in other words: the validator complains about the input line, but does not 
complain about tidy's output.

Do you need more proof than that?

I'll give you an example that makes it a bit more evident.

Admittedly, in the given example both versions behave identically in most 
browsers. But chek the following with your favourite browser, it is 
syntactically identical to the given example:

<a href="search.pl?key=1&amp=2&nbsp=3">wrong version</a>
<a href="search.pl?key=1&amp;&amp=2&amp;nbsp=3">correct version</a>

The expected URL is "search.pl?key=1&amp=2&nbsp=3", but your browser will 
most certainly render it as "search.pl?key=1&=2 =3".

That (and because it's said so in the HTML specifications) is why Tidy is 
correctly replacing "&" with "&amp;" in your HREF and SRC attributes.


cheers,

sebastian


At 14:01 17.05.2001 -0400, Reitzel, Charlie wrote:
>I'm w/ Kris on this one.  Tidy probably shouldn't replace & with &amp;
>within the text of an href or src attribute.
>
>I did a test, however, and it appears that the link is not broken by the
>replacement.  The browser may substitute the entity w/ the actual character
>before submitting the URL.  I just did this quickly, without a functional
>URL.  But the URL displayed on the status line of the browser (IE5) appears
>correct: file:///C:/temp/search.pl?key1=value1&key2=value2
>
>You can turn off & replacement with the option:
>         tidy --quote-ampersand=no ...
>
>take it easy,
>Charlie
>
>-----Original Message-----
>From: Fred Bone [mailto:Fred.Bone@dial.pipex.com]
>Sent: Wednesday, May 16, 2001 11:16 AM
>To: html-tidy@w3.org
>Cc: VAN BRUWAENE Kris
>Subject: Re: Unexpected behaviour
>
>
>On 16 May 2001, at 10:15, VAN BRUWAENE Kris wrote:
>
> > When using Tidy to convert from html to xhtml I find that
> > it replaces & within url's with &amp;  This looks somewhat
> > unexpected to me.  Is it a bug or is there a reason for it?
> > e.g.:
> > <a href="search.pl?key1=value1&key2=value2"> becomes
> > <a href="search.pl?key1=value1&amp;key2=value2">
> > Regards
> > Kris Van Bruwaene
>
>Tidy is correcting your html. This is nothing to do with conversion.

--
Sebastian Lange
http://www.sl-chat.de/
Maybe the first chat site that validates as HTML
4.0 even though user input may contain HTML codes.

Courtesy to Dave Raggett's HTML Tidy:
http://www.w3.org/People/Raggett/tidy/

Tidy your documents ONLINE:
http://www.sl-chat.de/Tidy/
Received on Thursday, 17 May 2001 14:40:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT