RE: bug

> The application I am writing runs tidy and then programmatically extracts the
> hrefs from the resulting tidied document and spiders those hrefs. The
> spider was
> not replacing & with & before sending the http request. I will do the
> replacement inside the spider library. I just assumed that urls within hrefs
> would be exactly the same after running jtidy.

The best course for such dilemmas is to run the HTML in question through a
validator.  You can use the W3C's validator or (my favorite)
www.htmlhelp.com/tools/validator  ...  in either case, it would tell you that
naked ampersands (i.e., not escaped as &) are not OK, either in URLs or
anywhere else.  A validator is analogous a final spelling checker.  It's good to
run a document through a validator, even if it's been "Tidied".  Tidy is good,
but the validator is the ultimate test.


/Jelks

Received on Tuesday, 23 May 2000 15:03:08 UTC