- From: Richard A. O'Keefe <ok@cs.otago.ac.nz>
- Date: Tue, 5 Mar 2002 10:53:28 +1300 (NZDT)
- To: html-tidy@w3.org, rick.parsons@eds.com
"Fred" said Why not let Tidy allow me to code the "at" sign as @... and Rick Parsons replied I would support a request for feature to that effect. But (a) we don't need it, and (b) there isn't the slightest reason to suspect that it would accomplish its intended goal of defeating spammers. Let's take (b) first. The aim is to write name6host.subdomain.domain and thereby defeat spammers looking for name@host... But all a spammer has to do is pick up any of the free programs out there that handle HTML, such as tidy, and have it spit out attribute values and plain text, and search the result. I'm aware of HTML parsers in C, Erlang, Mercury, Smalltalk, and a few other languages, carefully written to hack the kind of garbage that's out there. (One of them uses no fewer than 16 stacks, because lots of pages don't nest their tags properly.) Indeed, if the spammer just cleans the page up using Tidy (even if it DOES use this kluge), then any of the free SGML parsers around could be used. It'd take me about 30 minutes to whip up a program to work around this kind of kluge AND test it enough. (a) We DO NOT NEED ANY CHANGE TO Tidy to accomplish this hack. Assuming that the output contains no CDATA sections, then a trivial sed -e 's/@/\@/g' will do the job, because an @ sign can occur in - plain text (change wanted) - an attribute value (change wanted) - a comment (change wanted or harmless) - a processing instruction (potentially harmful) - a CDATA section (definitely harmful) and processing instructions are extremely rare in HTML, and tend not to contain @ signs when they do occur. Using AWK or Perl instead of sed, we could even cope with processing instructions and CDATA sections. We don't need any feeping creaturism. Tidy is complicated enough; things that can be done _outside_ Tidy using trivial scripts should be done outside Tidy. But should everyone be left to invent their own wheels? No, there should be a scripts/contrib/ directory with examples of problems that can be solved this way and how they can be solved.
Received on Monday, 4 March 2002 16:53:32 UTC