- From: Charles Reitzel <creitzel@rcn.com>
- Date: Mon, 11 Nov 2002 14:05:36 -0500
- To: "Moshe Plotkin" <mplotkin@hotmail.com>
- Cc: <html-tidy@w3.org>
Hi Moshe,
Just to get it straight, are you using TidyLib directly or via the ATL
wrapper? It will make a difference.
If using TidyLib and the BSTR directly, I'd just use UTF16LE character
encoding. If the BSTR is not NULL terminated (my memory escapes me), just
attach a TidyBuffer to it and call tidyParseBuffer().
int tidyParseBSTR( TidyDoc tdoc, BSTR content )
{
int rc = 0;
TidyBuffer buf;
tidyBufAttach( &buf, content, ::SysStringByteLen(content) );
tidyOptSetCharEncoding( tdoc, _T("UTF16LE") );
return tidyParseBuffer( tdoc, &buf );
}
hth,
Charlie
At 01:19 PM 11/11/2002 -0800, Moshe Plotkin wrote:
>B"H
>
>Actualy I work with Meir Kogan, and I have the string as a BSTR Which as
>far as I understand is just wchar_t * with a length.
>The vbscript page that I am getting it from is set to use codepage 65001
>i.e. utf8. So I am asuming its utf8 stored in an array of wchar_t however
>that works. I was thinking of redfining the tidy string (whats it called
>cbmtstr?) to use wchar_t and then rely on the config file to alter the
>internal ... or to just cast the bstr to byte* and put it in the buffer.
>or maybe I'm way off.
>
>thanks for all the help.
>
>
> ----- Original Message -----
>From: "Charles Reitzel" <creitzel@rcn.com>
>To: "Moshe Plotkin" <mplotkin@hotmail.com>
>Cc: <html-tidy@w3.org>
>Sent: Monday, November 11, 2002 6:27 AM
>Subject: Re: UTF8 without tempfiles
>
>
> > Hi Moshe,
> >
> > wchar_t is usually UTF16. What platform are you on? It
> > helps to figure out if you should use Little or Big Endian
> > unicode (UTF16LE and UTF16BE, respectively). If you can
> > manage to save your documents with a byte-order mark (two
> > bytes at the beginning of the file that indicate the byte
> > order), you can specify plain UTF16.
> >
> > For example, Intel (Windows and Linux) are LE. Sparc
> > (Solaris) and PowerPC (Mac, IBM AIX) are BE. Alpha (Linux)
> > can be either, but is usually LE.
> >
> > take it easy,
> > Charlie
> >
> > At 01:22 PM 11/10/2002 -0800, Moshe Plotkin wrote:
> > >B"H
> > >
> > > Can someone please send me a very simple example of
> > > using TidyLib with UTF8 strings.
> > >
> > >I have the data in a wchar_t* and would like to return
> > >a wchar_t*
> > >
> > >thank you verry much
> >
Received on Monday, 11 November 2002 14:04:26 UTC