Re: Help with tidy? from Cory Nelson on 2004-07-07 (html-tidy@w3.org from July to September 2004)

From: Cory Nelson <phrosty@gmail.com>
Date: Wed, 7 Jul 2004 13:27:31 -0700
To: Paul Reger <paulr@olivetree.com>
Cc: html-tidy@w3.org
Message-ID: <9b1d061404070713274ab5dbc0@mail.gmail.com>

Tidy isn't meant as an HTML library, you should probably use a tag
soup parser for that. You could use tidy to convert your html to xml,
then run that through xslt to whatever you want.

-xml makes tidy think the input is xml.  to convert from html to xml,
use -asxhtml

the "unexpected </head> in <link>" etc is due to <link ...> not being
valid xml, it needs to be <link ... />


----- Original Message -----
From: Paul Reger <paulr@olivetree.com>
Date: Thu, 1 Jul 2004 11:37:22 -0700
Subject: Help with tidy?
To: html-tidy@w3.org









Hi,

 

I am a new user of Tidy.  I wish to use it as 
the basis for a parser of HTML documents.  The parser will be part of 
a conversion tool to convert from HTML to another markup language that is 
proprietary to our company..

 

I have some questions, and any help lent would be 
most appreciated.  If you could point me at documents or other code, that 
would be most helpful.

 

Tidy is reporting errors in a sample 
file that I am feeding it.  When I use the -xml switch, tidy 
reports the document with 4 errors and w/o the -xml switch, tidy reports the 
document has 1,481 errors.

 

When I do not include the -xml switch, tidy 
reports this one error (several times):

 

line 1275 column 7 - Error: <o:p> is not 
recognized! 

 



When I do include the -xml switch, tidy 
reports the following 4 errors:

 

line 1268 column 1 - Error: unexpected 
</head> in <link>

line 15717 column 1 - Error: unexpected 
</div> in <hr>
line 15719 column 1 - Error: unexpected 
</body> in <hr>
line 15721 column 1 - Error: unexpected 
</html> in <hr>

 

Thanks in advance for any help,

 

Paul Reger (paulr@olivetree.com)
Senior Software 
Engineer
Olivetree Bible Software
Got a PDA?  Want a free 
Bible?  Goto:
www.olivetree.com

Received on Wednesday, 7 July 2004 16:27:53 UTC