- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Fri, 18 May 2001 12:08:45 -0400
- To: "'Randy Waki'" <rwaki@flipdog.com>, "'Sebastian Lange'" <lange@cyperfection.de>
- Cc: html-tidy@w3.org
You learn something new every day. Thanks for the postings.
-----Original Message-----
From: Randy Waki [mailto:rwaki@flipdog.com]
Sent: Thursday, May 17, 2001 8:34 PM
To: html-tidy@w3.org
Subject: RE: Unexpected behaviour
I don't want to turn this into a bigger deal than it needs to be, but
it turns out I spent some time investigating Tidy's replacement of "&"
with "&" a few months ago. Many of the points have been mentioned
by others in this thread.
- The HTML, XML, and XHTML specs all require "&" to be written as
"&". This is because "&" can be ambiguous while "&" is not.
- Most (all?) browsers do NOT require "&" to be written as "&".
This is because they employ heuristics to resolve an ambiguous "&".
These heuristics usually work but they introduce inconsistencies,
which is probably why the specs didn't use them. Most (all?)
browsers also accept the unambiguous "&". This is entirely
about disambiguating the markup syntax. Users and web servers
should not see "&" in URLs.
- Tidy's job is to clean up HTML to conform with the specs while
preserving what users experience in their browsers. So Tidy
replaces the illegal "&" with the legal "&", employing the same
heuristics as browsers to resolve any ambiguities. Again, users and
web servers should not see a difference. Only people who look at
the markup can tell.
- Still, some people who look at the markup, for their own reasons
good or bad, want to see "&" even if it is illegal and possibly
ambiguous. For those people, Tidy has a quote-ampersand=no option
that disables the "&" replacement.
- The current 4 August 2000 Tidy has a bug that occasionally causes it
to botch the "&" replacement. I submitted a patch a few months ago:
http://lists.w3.org/Archives/Public/html-tidy/2001JanMar/0196.html
Randy
Received on Friday, 18 May 2001 12:08:40 UTC