- From: Luc Martinez <l.martinez@alyas-group.com>
- Date: Mon, 10 May 2010 18:27:03 +0200
- To: <html-tidy@w3.org>
- Message-ID: <!&!AAAAAAAAAAAYAAAAAAAAAKENVcK/7PlOhYrkt/Y4/XrCgAAAEAAAAKsGA2p1NzRAqyYnQSBmfxkBAA>
Hi, I use tidy in conjunction with curlPP and libxml2. Under certain circumstances, tidy adds some backslashes that make libxml2 fails. The output of curlPP is okay. Example: -- Entity: line 1658: parser error : StartTag: invalid element name <\/script><!--10/05/2010 16:43:26!--><\/body> ^ Entity: line 1658: parser error : StartTag: invalid element name <\/script><!--10/05/2010 16:43:26!--><\/body> ^ Entity: line 1659: parser error : StartTag: invalid element name <\/html> ^ -- The original file (before tidy) is in the attachment. The function that uses tidy is the following (copied from the examples ;) : -- char* Transport::html2xml(const char* input) { TidyBuffer output; TidyBuffer errbuf; int rc = -1; Bool ok; TidyDoc tdoc = tidyCreate(); // Initialize "document" tidyBufInit( &output ); tidyBufInit( &errbuf ); // printf( "Tidying:\t\%s\n", input ); ok=tidyOptSetBool( tdoc, TidyXmlOut, yes ); // Convert to XML ok=tidyOptSetBool( tdoc, TidyXmlDecl, yes ); if(tidyOptSetValue( tdoc, TidyOutCharEncoding, "utf8")!=true) { cerr << "tidyOptSetValue( tdoc, TidyOutCharEncoding, \"utf8\")!=true" << endl; exit(1); } tidyOptSetBool(tdoc, TidyWrapScriptlets, yes); tidyOptSetBool(tdoc, TidyWrapAsp, yes); tidyOptSetBool(tdoc, TidyWrapJste, yes); tidyOptSetBool(tdoc, TidyWrapPhp, yes); tidyOptSetBool(tdoc, TidyFixBackslash, yes); tidyOptSetBool(tdoc, TidyMark, no); // ok=tidyOptSetBool( tdoc, TidyQuoteNbsp, no ); // Output non-breaking space as entity ok=tidyOptSetBool( tdoc, TidyNumEntities, yes ); // Voir tidy/pprint.c l 720 if ( ok ) rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics if ( rc >= 0 ) rc = tidyParseString( tdoc, input ); // Parse the input if ( rc >= 0 ) rc = tidyCleanAndRepair( tdoc ); // Tidy it up! if ( rc >= 0 ) rc = tidyRunDiagnostics( tdoc ); // Kvetch if ( rc > 1 ) // If error, force output. rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 ); if ( rc >= 0 ) rc = tidySaveBuffer( tdoc, &output ); // Pretty Print if(rc>=0) { ofstream of("xhtml.htm"); of << output.bp; of.close(); } else printf( "A severe error (\%d) occurred.\n", rc ); char *str; str=(char*)malloc(output.size+1); bzero(str, output.size+1); memcpy(str, output.bp, output.size+1); tidyBufFree( &output ); tidyBufFree( &errbuf ); tidyRelease( tdoc ); return(str); } -- Where did I made a mistake? Thanks in advance.
Attachments
- text/plain attachment: 1536-comment-la-revelation-a-commence-.txt
Received on Friday, 14 May 2010 13:18:44 UTC