W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Suggested bug fix: Ignore line breaks in URL attributes

From: Randy Waki <rwaki@flipdog.com>
Date: Fri, 24 Mar 2000 11:45:32 -0600
To: <html-tidy@w3.org>
Message-ID: <OFBBC04388.2E74F39A-ON86256865.006FA183@rfdinc.com>

[Foiled by our domain name change! My apologies if anyone sees this twice.]

(This is a restatement of the problem described in
http://lists.w3.org/Archives/Public/html-tidy/1999JulSep/0080.html)

If a quoted URL attribute value (e.g., href in <a> elements) contains a
line break, 13-Jan-2000 Tidy changes the line break to a space while IE
and Netscape discard the line break.  This can result in a broken link
in the tidied document.

I believe the following change fixes the problem.  In lexer.c, insert
the following lines before line 2502:

                            /* discard line breaks in quoted URLs */
                            if (c == '\n' && IsUrl(name))continue;

/* existing line 2502 */    c = ' ';

------------------------ Example HTML document -------------------------
<html>
  <head>
    <title>x</title>  <body>
    <!-- Tidy gets this RIGHT. -->
    <p>
      Unquoted <a href=http://www.w3newline between the 3 and the c;
should link to "http://www.w3"
    </p>

    <!-- Tidy gets this WRONG. -->
    <p>
      <a href="http://www.w3
c.org/">http://www.w3c.org/</a> with a newline between the 3 and the c;
should link to "http://www.w3c.org/"
    </p>

    <!-- Tidy gets this RIGHT. -->
    <p>
      <a href="http://www.w3
 c.org/">http://www.w3c.org/</a> with a newline and a space between the
3 and the c; should link to "http://www.w3 c.org/" (note the space)
    </p>
  </body>
</html>
------------------------------------------------------------------------

Randy
Received on Friday, 24 March 2000 13:14:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT