Re: double quoted attribute value deleted

On 20 November 2010 at 13:22, leegold said:

> Hi, 
> 
> Using tidy to clean up an html file before I parse it. I thought Tidy
> would make the process smoother. I was looking for an automated way to
> report and fix for well-formed-ness... I am not an html standards expert
> I'm sure Tidy has good reason for doing the following by default.
> 
> Given a file with content only:  <table class=""datatable""></table>
> I do:  $ tidy  /home/g/Desktop/scrapes/xmlwf2.xml
> I get back:
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
> <html>
> <head>
> <meta name="generator" content=
> "HTML Tidy for Linux/x86 (vers 7 December 2008), see www.w3.org">
> <title></title>
> </head>
> <body>
> <table class=""></table>
> </body>
> </html>
> 
> So, "datatable" is removed. Why? I ask because tidy removed content here
> and I'm worried about that. Is there a way to make tidy not do that? 

You have two pieces of information attached to <table>:
 class=""   - a valid attribute, which Tidy has retained;
 datatable"" - which is invalid and has been removed.

If datatable is supposed to be a (non-standard) attribute then it needs 
an equals sign separating it from the empty value string (the ""). That 
is, you should put
 <table class="" datatable="">

If it is supposed to be the value of the class attribute then it needs to 
go inside the quotes.

You can't expect Tidy to guess which of these two you meant.

Received on Monday, 22 November 2010 09:40:37 UTC