- From: Eugeny N Dzhurinsky <bofh@redwerk.com>
- Date: Mon, 3 Nov 2008 19:50:20 +0200
- To: html-tidy@w3.org
- Message-ID: <20081103175020.GC78786@office.redwerk.com>
Hello, all! I want to use tidy as DLL in my Java application, I wrote the JNI bindings for the tidy and everything works fine, except one issue: the converted documents, which supposed to be XHTML 1.0 Strict, do not pass validation on w3.org. For example if I have the tag <a href=".." target="_blank"> - the attribute target is not removed. Could you please take a look at the code below and tell me what am I missing there? #include "TidyWrapper.h" #include <jni.h> #include <tidy.h> #include <buffio.h> #define DEBUG 1 #define DUMP_DOCUMENT 1 JNIEXPORT jbyteArray JNICALL Java_xhtml_tidy_jni_TidyWrapper_convert (JNIEnv *env, jobject self, jobject options, jbyteArray data) { TidyBuffer output = {0}; TidyBuffer errbuf = {0}; Bool ok; jboolean isCopy = 0; const char* input = (*env)->GetByteArrayElements(env, data, &isCopy); jfieldID fid; jclass cls; // get instance of class used for options cls = (*env)->GetObjectClass(env, options); TidyDoc tdoc = tidyCreate(); tidySetErrorBuffer( tdoc, &errbuf ); tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); tidyOptSetValue(tdoc, TidyEncoding, "UTF-8"); tidyOptSetValue(tdoc, TidyCharEncoding, "UTF-8"); tidyOptSetValue(tdoc, TidyDoctype, "strict"); // parse document if (DEBUG) write(2,"Parsing document\n",17); if (DUMP_DOCUMENT) { write(2,"Original document\n",18); write(2,input,strlen(input)); } tidyParseString( tdoc, input ); // convert document tidyCleanAndRepair( tdoc ); tidyRunDiagnostics( tdoc ); tidyOptSetBool(tdoc, TidyForceOutput, yes); tidySaveBuffer( tdoc, &output ); if (DUMP_DOCUMENT) { write(2,"Converted document\n",19); write(2,output.bp,output.size); } // create an array used to store the results jbyteArray results = (*env)->NewByteArray(env,output.size); /// copy data from resulting buffer to Java array (*env)->SetByteArrayRegion(env,results,0,output.size,output.bp); tidyBufFree( &errbuf ); tidyBufFree( &output ); tidyRelease( tdoc ); return results; } While document get's the doctype of XHTML Strict, it still contains weird attributes which don't comform the XHTML Strict specification. I can see this in the debug output for the converted document right after it was processed by Tidy, so it is not related to any Java code, which calls the native routine. With tidy command-line utility HTML documents are converted just fine, I've looked at the sources and did not find any additional settings except TidyXhtmlOut to be set on -asxhtml command-line option. Thank you in advance! -- Best regards Eugene Dzhurinsky
Received on Monday, 3 November 2008 18:24:51 UTC