W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2010

Is Html-tidy safe for anonymous input?

From: Steelcurtain67 <reese@library4science.com>
Date: Mon, 13 Sep 2010 10:28:08 -0700 (PDT)
Message-ID: <29700797.post@talk.nabble.com>
To: html-tidy@w3.org
I am setting up a couple web forms to process html code output from word
processors like open office and MS word.  I would like to pass it through
Html Tidy to clean it up and provide me with a more standardized code.  I am
planning on using the following code.  Essentially I save the code as a file
and then run tidy (FreeBSD port) and recover the output.

 function MyTidy($D,$ID) {
	if(DEBUG > 79) DEBUG_L(__LINE__,"MyTidy() ",$this);

		$TmpTidy = uniqid();
		$TmpTidy = "${ID}_" . base_convert ($TmpTidy,16,36);
	if(DEBUG > 79) DEBUG_L(__LINE__,"MyTidy TmpTidy $TmpTidy ",$this);
	
		$D = stripslashes($D);
		$IN = "/usr/local/www/libsci/tmp/${TmpTidy}_IN.html";
		$OUT = "/usr/local/www/libsci/tmp/${TmpTidy}_OUT.html";
		file_put_contents($IN,$D) ;
	if(DEBUG > 79) DEBUG_L(__LINE__,"MyTidy file_put_contents(\$IN,D) ;
",$this);
		`/usr/local/bin/tidy -config /usr/local/etc/tidy.cfg -f /tmp/tidy.errors
$IN  > $OUT`;
		if(DEBUG > 79) DEBUG_L(__LINE__,"`/usr/local/bin/tidy -config
/usr/local/etc/tidy.cfg -f /tmp/tidy.errors 
/usr/local/www/libsci/tmp/IN_${TmpTidy}.html  >
/usr/local/www/libsci/tmp/OUT_${TmpTidy}.html`;",$this);
		$D =   file_get_contents($OUT) ;
		$D = addslashes($D);
		return $D;
	}

where $D =  $_POST[HTML_CODE].

So my question is can I do this on code/text that anyone can paste into a
form box and not worry about a code exploit.  The code that operates on the
tidy output is mostly preg_replace and preg_match and some string
concatenation.  The final result is converted with php htmlentites before it
is output so it should be safe at that point.

I would appreciate any thoughts on this, I am doing this for myself to help
in converting text documents to epub format but there a lot of people who
ask for help doing this and I would like to make it public but not at the
cost of a server compromise.  I have read a few websites regarding security
but I am more uncertain now then I was before.
-- 
View this message in context: http://old.nabble.com/Is-Html-tidy-safe-for-anonymous-input--tp29700797p29700797.html
Sent from the w3.org - html-tidy mailing list archive at Nabble.com.
Received on Monday, 13 September 2010 21:33:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:14:00 GMT