- From: Michael A. Peters <mpeters@mac.com>
- Date: Tue, 10 Mar 2009 02:04:32 -0700
- To: html-tidy@w3.org
I've used the stand alone executable in the past (Linux) when I came across documentation for something that didn't parse in my browser, and it worked beautifully for that, but I am using tidy in a program now for the first time. A JavaScript book I bought years ago had a CDROM full of stuff that seemed IE centric and simply displayed funny - which surprised me because the author of the book was all about standards. I guess he didn't author the demo CDROM. tidy cleaned up a lot of issues with the demo CD, so I'm very grateful and have seen the power of tidy. I'm attempting to write my first php class, inspired by http://people.mozilla.org/~bsterne/content-security-policy/ (CSP from here on out) Essentially what I am trying to do is write a class that implements CSP on the server BEFORE the page is sent to the user. I'm doing this by walking the DOM (via DOMDocument) and removing stuff that would not be allowed by the specified CSP. The first thing the class does though is run it through tidy, as really bad html can be problematic when loading into a DOMDocument object. After going through tidy the class eats the html into the DOMDocument object. When writing intentionally bad html to test it and figure out the best set of default parameters, I noticed that if I have, say, a superfluous <title>something</title> tag somewhere, tidy will put it in the head after the first title tag, when the behavior I would prefer is that tidy just delete it. Is there a config option I am missing, or is that not supported? I can clean such stuff up myself through the DOM manipulation I do after eating the cleaned HTML but if there is a way to have tidy do it (specifically the version that ships with CentOS 5 - libtidy-0.99.0-14.20070615.el5 - that would be preferable as I wouldn't then be writing code for what already exists in a class I am already using.
Received on Tuesday, 10 March 2009 09:21:17 UTC