W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2009

new to tidy - remove illegal duplicate tags ??

From: Michael A. Peters <mpeters@mac.com>
Date: Tue, 10 Mar 2009 02:04:32 -0700
Message-id: <49B62D20.5070007@mac.com>
To: html-tidy@w3.org
I've used the stand alone executable in the past (Linux) when I came 
across documentation for something that didn't parse in my browser, and 
it worked beautifully for that, but I am using tidy in a program now for 
the first time. A JavaScript book I bought years ago had a CDROM full of 
stuff that seemed IE centric and simply displayed funny - which 
surprised me because the author of the book was all about standards. I 
guess he didn't author the demo CDROM. tidy cleaned up a lot of issues 
with the demo CD, so I'm very grateful and have seen the power of tidy.

I'm attempting to write my first php class, inspired by

http://people.mozilla.org/~bsterne/content-security-policy/ (CSP from 
here on out)

Essentially what I am trying to do is write a class that implements CSP 
on the server BEFORE the page is sent to the user.

I'm doing this by walking the DOM (via DOMDocument) and removing stuff 
that would not be allowed by the specified CSP.

The first thing the class does though is run it through tidy, as really 
bad html can be problematic when loading into a DOMDocument object. 
After going through tidy the class eats the html into the DOMDocument 
object.

When writing intentionally bad html to test it and figure out the best 
set of default parameters, I noticed that if I have, say, a superfluous 
<title>something</title> tag somewhere, tidy will put it in the head 
after the first title tag, when the behavior I would prefer is that tidy 
just delete it.

Is there a config option I am missing, or is that not supported?

I can clean such stuff up myself through the DOM manipulation I do after 
eating the cleaned HTML but if there is a way to have tidy do it 
(specifically the version that ships with CentOS 5 - 
libtidy-0.99.0-14.20070615.el5 - that would be preferable as I wouldn't 
then be writing code for what already exists in a class I am already using.
Received on Tuesday, 10 March 2009 09:21:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:59 GMT