- From: Charles Reitzel <creitzel@rcn.com>
- Date: Mon, 11 Nov 2002 14:05:36 -0500
- To: "Moshe Plotkin" <mplotkin@hotmail.com>
- Cc: <html-tidy@w3.org>
Hi Moshe, Just to get it straight, are you using TidyLib directly or via the ATL wrapper? It will make a difference. If using TidyLib and the BSTR directly, I'd just use UTF16LE character encoding. If the BSTR is not NULL terminated (my memory escapes me), just attach a TidyBuffer to it and call tidyParseBuffer(). int tidyParseBSTR( TidyDoc tdoc, BSTR content ) { int rc = 0; TidyBuffer buf; tidyBufAttach( &buf, content, ::SysStringByteLen(content) ); tidyOptSetCharEncoding( tdoc, _T("UTF16LE") ); return tidyParseBuffer( tdoc, &buf ); } hth, Charlie At 01:19 PM 11/11/2002 -0800, Moshe Plotkin wrote: >B"H > >Actualy I work with Meir Kogan, and I have the string as a BSTR Which as >far as I understand is just wchar_t * with a length. >The vbscript page that I am getting it from is set to use codepage 65001 >i.e. utf8. So I am asuming its utf8 stored in an array of wchar_t however >that works. I was thinking of redfining the tidy string (whats it called >cbmtstr?) to use wchar_t and then rely on the config file to alter the >internal ... or to just cast the bstr to byte* and put it in the buffer. >or maybe I'm way off. > >thanks for all the help. > > > ----- Original Message ----- >From: "Charles Reitzel" <creitzel@rcn.com> >To: "Moshe Plotkin" <mplotkin@hotmail.com> >Cc: <html-tidy@w3.org> >Sent: Monday, November 11, 2002 6:27 AM >Subject: Re: UTF8 without tempfiles > > > > Hi Moshe, > > > > wchar_t is usually UTF16. What platform are you on? It > > helps to figure out if you should use Little or Big Endian > > unicode (UTF16LE and UTF16BE, respectively). If you can > > manage to save your documents with a byte-order mark (two > > bytes at the beginning of the file that indicate the byte > > order), you can specify plain UTF16. > > > > For example, Intel (Windows and Linux) are LE. Sparc > > (Solaris) and PowerPC (Mac, IBM AIX) are BE. Alpha (Linux) > > can be either, but is usually LE. > > > > take it easy, > > Charlie > > > > At 01:22 PM 11/10/2002 -0800, Moshe Plotkin wrote: > > >B"H > > > > > > Can someone please send me a very simple example of > > > using TidyLib with UTF8 strings. > > > > > >I have the data in a wchar_t* and would like to return > > >a wchar_t* > > > > > >thank you verry much > >
Received on Monday, 11 November 2002 14:04:26 UTC