W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2002

Re: UTF8 without tempfiles

From: Moshe Plotkin <mplotkin@hotmail.com>
Date: Mon, 11 Nov 2002 13:19:28 -0800
To: "Charles Reitzel" <creitzel@rcn.com>
Cc: <html-tidy@w3.org>
Message-ID: <OE47Kk8a582BEsbYYN300001e6e@hotmail.com>

B"H

Actualy I work with Meir Kogan, and I have the string as a BSTR Which as far
as I understand is just wchar_t * with a length.
The vbscript page that I am getting it from is set to use codepage 65001
i.e. utf8. So I am asuming its utf8 stored in an array of wchar_t however
that works. I was thinking of redfining the tidy string (whats it called
cbmtstr?) to use wchar_t and then rely on the config file to alter the
internal ... or to just cast the bstr to byte* and put it in the buffer. or
maybe I'm way off.

thanks for all the help.


 ----- Original Message -----
From: "Charles Reitzel" <creitzel@rcn.com>
To: "Moshe Plotkin" <mplotkin@hotmail.com>
Cc: <html-tidy@w3.org>
Sent: Monday, November 11, 2002 6:27 AM
Subject: Re: UTF8 without tempfiles


> Hi Moshe,
>
> wchar_t is usually UTF16.  What platform are you on?  It helps to figure
> out if you should use Little or Big Endian unicode (UTF16LE and UTF16BE,
> respectively).  If you can manage to save your documents with a byte-order
> mark (two bytes at the beginning of the file that indicate the byte
order),
> you can specify plain UTF16.
>
> For example, Intel (Windows and Linux) are LE.  Sparc (Solaris) and
PowerPC
> (Mac, IBM AIX) are BE.  Alpha (Linux) can be either, but is usually LE.
>
> take it easy,
> Charlie
>
> At 01:22 PM 11/10/2002 -0800, Moshe Plotkin wrote:
> >B"H
> >
> >Can someone please send me a very simple example of using TidyLib with
> >UTF8 strings.
> >
> >I have the data in a wchar_t* and would like to return a wchar_t*
> >
> >thank you verry much
>
Received on Monday, 11 November 2002 13:23:01 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:48 UTC