W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: Unexpected behaviour

From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
Date: Fri, 18 May 2001 12:08:45 -0400
Message-ID: <B5C79DDBC655D311B6BD0008C7E64D76013C152A@exchange.arrakisplanet.com>
To: "'Randy Waki'" <rwaki@flipdog.com>, "'Sebastian Lange'" <lange@cyperfection.de>
Cc: html-tidy@w3.org
You learn something new every day.  Thanks for the postings.

-----Original Message-----
From: Randy Waki [mailto:rwaki@flipdog.com]
Sent: Thursday, May 17, 2001 8:34 PM
To: html-tidy@w3.org
Subject: RE: Unexpected behaviour


I don't want to turn this into a bigger deal than it needs to be, but
it turns out I spent some time investigating Tidy's replacement of "&"
with "&amp;" a few months ago.  Many of the points have been mentioned
by others in this thread.

  - The HTML, XML, and XHTML specs all require "&" to be written as
    "&amp;".  This is because "&" can be ambiguous while "&amp;" is not.

  - Most (all?) browsers do NOT require "&" to be written as "&amp;".
    This is because they employ heuristics to resolve an ambiguous "&".
    These heuristics usually work but they introduce inconsistencies,
    which is probably why the specs didn't use them.  Most (all?)
    browsers also accept the unambiguous "&amp;".  This is entirely
    about disambiguating the markup syntax.  Users and web servers
    should not see "&amp;" in URLs.

  - Tidy's job is to clean up HTML to conform with the specs while
    preserving what users experience in their browsers.  So Tidy
    replaces the illegal "&" with the legal "&amp;", employing the same
    heuristics as browsers to resolve any ambiguities.  Again, users and
    web servers should not see a difference.  Only people who look at
    the markup can tell.

  - Still, some people who look at the markup, for their own reasons
    good or bad, want to see "&" even if it is illegal and possibly
    ambiguous.  For those people, Tidy has a quote-ampersand=no option
    that disables the "&" replacement.

  - The current 4 August 2000 Tidy has a bug that occasionally causes it
    to botch the "&" replacement.  I submitted a patch a few months ago:

    http://lists.w3.org/Archives/Public/html-tidy/2001JanMar/0196.html

Randy
Received on Friday, 18 May 2001 12:08:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT