- From: Matthew Stanfield <mattstan@blueyonder.co.uk>
- Date: Thu, 03 Apr 2003 19:50:05 +0100
- To: Charles Reitzel <creitzel@rcn.com>
- CC: html-tidy <html-tidy@w3.org>, html-tidy-developers <tidy-develop@lists.sourceforge.net>
Hi Charlie, I can not reproduce the problem from the command line tool, the char 0x98 never gets inserted regardless of whether using ascii, latin1, raw, or utf. The sample files will follow in a seperate email so the lists don't get them. --Anyone on the lists who wants them please say so. Thanks and regards, ..matthew Charles Reitzel wrote: > Hi Matt, > > Can you send a sample file w/ config? 0x98 is either an illegal > character or a Windows 1251 "small tilde" (should be translated to > U+02DC). Can you reproduce the problem with the command line tool? If > so, then we can treat it as a Tidy issue. Otherwise, my bad. > > There is still a spurious extra newline (0xD) problem with some > encodings. May be related. > > take it easy, > Charlie > > > At 06:26 PM 4/3/2003 +0100, Matthew Stanfield wrote: > >> Hi, >> >> When tidying html and outputting as xml, there is a symbol that is >> appearing at the start of my XML files, ascii value is 0x98. How do I >> stop it appearing? >> >> I assume this is the 'unicode Byte Order Mark character' that is >> mentioned in the Tidy configuration options reference. However if I >> set 'output-bom' to false the symbol still appears. I've tried using >> various char encodings setting all these char-encoding, >> input-encoding, output-encoding to: ascii, latin1, raw, and utf16 >> --the character is always there regardless of what encoding I use. >> >> The char is stopping tidy output as xml from being read correctly by >> .net's C# XPathDocument class. When I manually remove the char all >> works fine. >> >> I am using Charles Reitzel's COM/ATL dll. >> >> Many thanks and regards, >> >> ..matthew
Received on Thursday, 3 April 2003 13:50:49 UTC