double quoted attribute value deleted


Using tidy to clean up an html file before I parse it. I thought Tidy would
make the process smoother. I was looking for an automated way to report and
fix for well-formed-ness... I am not an html standards expert I'm sure Tidy
has good reason for doing the following by default.

Given a file with content only:  <table class=""datatable""></table>
I do:  $ tidy  /home/g/Desktop/scrapes/xmlwf2.xml
I get back:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 7 December 2008), see">
<table class=""></table>

So, "datatable" is removed. Why? I ask because tidy removed content here and
I'm worried about that. Is there a way to make tidy not do that? 

And, when I parse I need all the anchors and "signs along the road" I can
get as flags if you know what mean...


Lee G.

View this message in context:
Sent from the - html-tidy mailing list archive at

Received on Monday, 22 November 2010 07:37:13 UTC