W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

Re: Annoying Symbol At Start Of XML Outputted File.

From: Charles Reitzel <creitzel@rcn.com>
Date: Thu, 03 Apr 2003 12:46:01 -0500
Message-Id: <4.3.2.7.2.20030403123745.01b699b8@pop.rcn.com>
To: Matthew Stanfield <mattstan@blueyonder.co.uk>
Cc: html-tidy <html-tidy@w3.org>

Hi Matt,

Can you send a sample file w/ config?  0x98 is either an illegal character 
or a Windows 1251 "small tilde" (should be translated to U+02DC).  Can you 
reproduce the problem with the command line tool?  If so, then we can treat 
it as a Tidy issue.  Otherwise, my bad.

There is still a spurious extra newline (0xD) problem with some 
encodings.  May be related.

take it easy,
Charlie


At 06:26 PM 4/3/2003 +0100, Matthew Stanfield wrote:
>Hi,
>
>When tidying html and outputting as xml, there is a symbol that is 
>appearing at the start of my XML files, ascii value is 0x98. How do I stop 
>it appearing?
>
>I assume this is the 'unicode Byte Order Mark character' that is mentioned 
>in the Tidy configuration options reference. However if I set 'output-bom' 
>to false the symbol still appears. I've tried using various char encodings 
>setting all these char-encoding, input-encoding, output-encoding to: 
>ascii, latin1, raw, and utf16 --the character is always there regardless 
>of what encoding I use.
>
>The char is stopping tidy output as xml from being read correctly by 
>.net's C# XPathDocument class. When I manually remove the char all works fine.
>
>I am using Charles Reitzel's COM/ATL dll.
>
>Many thanks and regards,
>
>..matthew
Received on Thursday, 3 April 2003 12:46:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT