- From: Richard Fine <richard@gamedev.net>
- Date: Thu, 02 Jul 2009 23:19:51 +0100
- To: html-tidy@w3.org
Hello everyone, I'm building an input sanitizer around Charles Reitzel's Tidy.NET bindings; I'm using it to take potentially-malformed XHTML + proprietary-namespaced tag soup and produce some kind of valid XML from it. It's working OK so far but I'm a bit surprised by one of my testcases - the output is valid XML but it's not what I was expecting it to be. From the default options, I explicitly turn on: input-xml output-xml force-output and give it the input string: <html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello, <i<i>world!</b></body></html> Note the '<i<i>' construct before 'world.' I was expecting the output: <html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello, <i<i>world!</i></b></body></html> whereby the first < in the <i<i> is encoded as an entity. Instead, what I'm getting is: <html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello, <i i="">world!</i></b></body></html> the <i<i> is becoming <i i="">. How come? (I have a suspicion that the TidyATL/Tidy.NET packages on Charles' page are a bit out of date - they're certainly missing some of the more recent options - so if this is a bug that's already been fixed or something, I apologise for wasting your time...) Thanks in advance, - Richard
Received on Friday, 3 July 2009 09:33:06 UTC