W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

patch: nbsp in -xml mode (was: Re: removing &nbsp)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 25 May 2001 00:40:47 +0200
To: "Randy Waki" <rwaki@flipdog.com>
Cc: <html-tidy@w3.org>
Message-ID: <8r1rgts7mq94rn506716oobpgj66ud27fs@4ax.com>
* Randy Waki wrote:
>> i am trying to converting an html document to xml. In the intial, my
>> document conversion was fine until when i used XT parser to parse the
>> resulting xml ouput . I have the obtained the following error.
>>     xml:154: reference to undefined entity "nbsp"

>Using your config file, I get a character with hex value A0, which is
>correct for latin-1 encoding.  I don't know why you're getting "&nbsp;".
>However, if I delete the "quote-nbsp=no" from your config file, I get
>"&#160;" instead, which may be what you want (we use Tidy to output
>XHTML this way and it has been working just fine).


  Tidy inserts &nbsp; entity in -xml mode.


  % tidy -xml


  No config file. I already reported this bug some months ago.

Possible solutions:

  * let pprint.c:PPrintChar() check for XmlOut or
  * let -xml set NumEntities = yes

Patch for the first solution:

% diff -u -p -u ..\original\pprint.c pprint.c
--- ..\original\pprint.c        Fri Jul 28 17:57:56 2000
+++ pprint.c    Fri May 25 00:34:06 2001
@@ -404,7 +404,7 @@ static void PPrintChar(uint c, uint mode
                 AddC('&', linelen++);

-                if (NumEntities)
+                if (NumEntities || XmlOut)
                     AddC('#', linelen++);
                     AddC('1', linelen++);

PS: As usual, against 04 August Release.
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Thursday, 24 May 2001 18:39:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:50 UTC