W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2010

Problem with backslash

From: Luc Martinez <l.martinez@alyas-group.com>
Date: Mon, 10 May 2010 18:27:03 +0200
To: <html-tidy@w3.org>
Message-ID: <!&!AAAAAAAAAAAYAAAAAAAAAKENVcK/7PlOhYrkt/Y4/XrCgAAAEAAAAKsGA2p1NzRAqyYnQSBmfxkBAAAAAA==@alyas-group.com>
Hi,

 

I use tidy in conjunction with curlPP and libxml2.

Under certain circumstances, tidy adds some backslashes that make libxml2
fails.

The output of curlPP is okay.

Example:

--

Entity: line 1658: parser error : StartTag: invalid element name

  <\/script><!--10/05/2010 16:43:26!--><\/body>

   ^

Entity: line 1658: parser error : StartTag: invalid element name

  <\/script><!--10/05/2010 16:43:26!--><\/body>

                                        ^

Entity: line 1659: parser error : StartTag: invalid element name

<\/html>

 ^

--

The original file (before tidy) is in the attachment.

 

The function that uses tidy is the following (copied from the examples ;) :

--

char* Transport::html2xml(const char* input)

{

  TidyBuffer output;

  TidyBuffer errbuf;

  int rc = -1;

  Bool ok;

 

  TidyDoc tdoc = tidyCreate();                     // Initialize "document"

  tidyBufInit( &output );

  tidyBufInit( &errbuf );

//  printf( "Tidying:\t\%s\n", input );

 

  ok=tidyOptSetBool( tdoc, TidyXmlOut, yes );  // Convert to XML

  ok=tidyOptSetBool( tdoc, TidyXmlDecl, yes );  

  if(tidyOptSetValue( tdoc, TidyOutCharEncoding, "utf8")!=true) {

    cerr << "tidyOptSetValue( tdoc, TidyOutCharEncoding, \"utf8\")!=true" 

                 << endl;

    exit(1);

  }

 

  tidyOptSetBool(tdoc, TidyWrapScriptlets, yes);

  tidyOptSetBool(tdoc, TidyWrapAsp, yes);

  tidyOptSetBool(tdoc, TidyWrapJste, yes);

  tidyOptSetBool(tdoc, TidyWrapPhp, yes);

  tidyOptSetBool(tdoc, TidyFixBackslash, yes);

  tidyOptSetBool(tdoc, TidyMark, no);

 

//  ok=tidyOptSetBool( tdoc, TidyQuoteNbsp, no ); // Output non-breaking
space as entity

  ok=tidyOptSetBool( tdoc, TidyNumEntities, yes ); // Voir tidy/pprint.c l
720

 

  if ( ok )

    rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics

  if ( rc >= 0 )

    rc = tidyParseString( tdoc, input );           // Parse the input

  if ( rc >= 0 )

    rc = tidyCleanAndRepair( tdoc );               // Tidy it up!

  if ( rc >= 0 )

    rc = tidyRunDiagnostics( tdoc );               // Kvetch

  if ( rc > 1 )                                    // If error, force
output.

    rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );

  if ( rc >= 0 )

    rc = tidySaveBuffer( tdoc, &output );          // Pretty Print

 

  if(rc>=0) {

    ofstream of("xhtml.htm");

    of << output.bp;

    of.close();

  }

  else

    printf( "A severe error (\%d) occurred.\n", rc );

 

  char *str;

  str=(char*)malloc(output.size+1);

  bzero(str, output.size+1);

  memcpy(str, output.bp, output.size+1);

 

  tidyBufFree( &output );

  tidyBufFree( &errbuf );

  tidyRelease( tdoc );

  return(str);

}

--

Where did I made a mistake?

 

Thanks in advance.






Received on Friday, 14 May 2010 13:18:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:14:00 GMT