- From: leegold <leegold@speedymail.org>
- Date: Sat, 20 Nov 2010 13:22:13 -0800 (PST)
- To: html-tidy@w3.org
Hi, Using tidy to clean up an html file before I parse it. I thought Tidy would make the process smoother. I was looking for an automated way to report and fix for well-formed-ness... I am not an html standards expert I'm sure Tidy has good reason for doing the following by default. Given a file with content only: <table class=""datatable""></table> I do: $ tidy /home/g/Desktop/scrapes/xmlwf2.xml I get back: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Linux/x86 (vers 7 December 2008), see www.w3.org"> <title></title> </head> <body> <table class=""></table> </body> </html> So, "datatable" is removed. Why? I ask because tidy removed content here and I'm worried about that. Is there a way to make tidy not do that? And, when I parse I need all the anchors and "signs along the road" I can get as flags if you know what mean... Thanks, Lee G. -- View this message in context: http://old.nabble.com/double-quoted-attribute-value-deleted-tp30268141p30268141.html Sent from the w3.org - html-tidy mailing list archive at Nabble.com.
Received on Monday, 22 November 2010 07:37:13 UTC