- From: Luc Martinez <l.martinez@alyas-group.com>
- Date: Mon, 10 May 2010 18:27:03 +0200
- To: <html-tidy@w3.org>
- Message-ID: <!&!AAAAAAAAAAAYAAAAAAAAAKENVcK/7PlOhYrkt/Y4/XrCgAAAEAAAAKsGA2p1NzRAqyYnQSBmfxkBAA>
Hi,
I use tidy in conjunction with curlPP and libxml2.
Under certain circumstances, tidy adds some backslashes that make libxml2
fails.
The output of curlPP is okay.
Example:
--
Entity: line 1658: parser error : StartTag: invalid element name
<\/script><!--10/05/2010 16:43:26!--><\/body>
^
Entity: line 1658: parser error : StartTag: invalid element name
<\/script><!--10/05/2010 16:43:26!--><\/body>
^
Entity: line 1659: parser error : StartTag: invalid element name
<\/html>
^
--
The original file (before tidy) is in the attachment.
The function that uses tidy is the following (copied from the examples ;) :
--
char* Transport::html2xml(const char* input)
{
TidyBuffer output;
TidyBuffer errbuf;
int rc = -1;
Bool ok;
TidyDoc tdoc = tidyCreate(); // Initialize "document"
tidyBufInit( &output );
tidyBufInit( &errbuf );
// printf( "Tidying:\t\%s\n", input );
ok=tidyOptSetBool( tdoc, TidyXmlOut, yes ); // Convert to XML
ok=tidyOptSetBool( tdoc, TidyXmlDecl, yes );
if(tidyOptSetValue( tdoc, TidyOutCharEncoding, "utf8")!=true) {
cerr << "tidyOptSetValue( tdoc, TidyOutCharEncoding, \"utf8\")!=true"
<< endl;
exit(1);
}
tidyOptSetBool(tdoc, TidyWrapScriptlets, yes);
tidyOptSetBool(tdoc, TidyWrapAsp, yes);
tidyOptSetBool(tdoc, TidyWrapJste, yes);
tidyOptSetBool(tdoc, TidyWrapPhp, yes);
tidyOptSetBool(tdoc, TidyFixBackslash, yes);
tidyOptSetBool(tdoc, TidyMark, no);
// ok=tidyOptSetBool( tdoc, TidyQuoteNbsp, no ); // Output non-breaking
space as entity
ok=tidyOptSetBool( tdoc, TidyNumEntities, yes ); // Voir tidy/pprint.c l
720
if ( ok )
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
if ( rc >= 0 )
rc = tidyParseString( tdoc, input ); // Parse the input
if ( rc >= 0 )
rc = tidyCleanAndRepair( tdoc ); // Tidy it up!
if ( rc >= 0 )
rc = tidyRunDiagnostics( tdoc ); // Kvetch
if ( rc > 1 ) // If error, force
output.
rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
if ( rc >= 0 )
rc = tidySaveBuffer( tdoc, &output ); // Pretty Print
if(rc>=0) {
ofstream of("xhtml.htm");
of << output.bp;
of.close();
}
else
printf( "A severe error (\%d) occurred.\n", rc );
char *str;
str=(char*)malloc(output.size+1);
bzero(str, output.size+1);
memcpy(str, output.bp, output.size+1);
tidyBufFree( &output );
tidyBufFree( &errbuf );
tidyRelease( tdoc );
return(str);
}
--
Where did I made a mistake?
Thanks in advance.
Attachments
- text/plain attachment: 1536-comment-la-revelation-a-commence-.txt
Received on Friday, 14 May 2010 13:18:44 UTC