- From: Randy Waki <rwaki@flipdog.com>
- Date: Thu, 17 May 2001 18:34:05 -0600
- To: <html-tidy@w3.org>
I don't want to turn this into a bigger deal than it needs to be, but it turns out I spent some time investigating Tidy's replacement of "&" with "&" a few months ago. Many of the points have been mentioned by others in this thread. - The HTML, XML, and XHTML specs all require "&" to be written as "&". This is because "&" can be ambiguous while "&" is not. - Most (all?) browsers do NOT require "&" to be written as "&". This is because they employ heuristics to resolve an ambiguous "&". These heuristics usually work but they introduce inconsistencies, which is probably why the specs didn't use them. Most (all?) browsers also accept the unambiguous "&". This is entirely about disambiguating the markup syntax. Users and web servers should not see "&" in URLs. - Tidy's job is to clean up HTML to conform with the specs while preserving what users experience in their browsers. So Tidy replaces the illegal "&" with the legal "&", employing the same heuristics as browsers to resolve any ambiguities. Again, users and web servers should not see a difference. Only people who look at the markup can tell. - Still, some people who look at the markup, for their own reasons good or bad, want to see "&" even if it is illegal and possibly ambiguous. For those people, Tidy has a quote-ampersand=no option that disables the "&" replacement. - The current 4 August 2000 Tidy has a bug that occasionally causes it to botch the "&" replacement. I submitted a patch a few months ago: http://lists.w3.org/Archives/Public/html-tidy/2001JanMar/0196.html Randy
Received on Thursday, 17 May 2001 20:34:43 UTC