W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2001

Tidy uses undf. entities with -xml

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 14 Jan 2001 19:34:33 +0100
To: html-tidy@w3.org
Message-ID: <vhq36tc6sd2dr8pn328h1l1onuu8jiof5t@4ax.com>
Hi,

If I've got a XML document that includes e.g. an &nbsp; UTF-8 encoded,
and i tidy it up with -xml I get the HTML entity &nbsp; instead of the
korrekt UTF-8 sequence. The XML document isn't wellformed any longer and
therefore unusable. Using the -utf8 command line argument doesn't change
this behaivour.

Tidy must include something like

<!DOCTYPE html [
  <!ENTITY nbsp "&#160;">
]>

or use UTF-8 encoding or numeric character references when in XML mode.
-- 
Björn Höhrmann ^ mailto:bjoern@hoehrmann.de ^ http://www.bjoernsworld.de
am Badedeich 7 ° Telefon: +49(0)4667/981028 ° http://bjoern.hoehrmann.de
25899 Dagebüll # PGP Pub. KeyID: 0xA4357E78 # http://learn.to/quote [!]e
<x>&#73; &#x2665; &#x2640;, &#x266B; &#x26; &#88;&#77;&#76; &#x263A;</x>
Received on Sunday, 14 January 2001 13:33:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT