W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2002

special characters in comments getting 'mangled'

From: Fred <fred@gloryofgod.com>
Date: Fri, 12 Jul 2002 10:55:31 -0700 (PDT)
To: html-tidy@w3.org
Message-ID: <Pine.LNX.4.44.0207121021530.10810-100000@drizzle.com>

I have run into a problem while tidying up some html that has 
comments in it.  Maybe this can turn into a requested feature??

the comment looks something like 
<!-- <o:tag>Coulomb's law</o:tag> -->
Except that the ' is a chr 146.  In other words, a 'smart apostropy' or 
'curly apostropy'
(yes this is output from word if you are curious)

This character is getting changed to something else. In my text editor it 
indicates it is a chr 25.  In Word (opened as raw text) it inidicates chr 
13.  I am not convinced that it is either of these because of the tools I 
used.  I can get more carefull and find out if I need to, but it is 
definately no longer a chr 146.  It may be the unicode equivalent of a 
smart apostropy???

If chr 146 is in a regular uncommented tag it becomes &#8217;
Which is great. 

Is there a way I can get tidy to convert the character in the same way as 
if it was not in the comment?  or to leave it alone?

I did search the archives and tried different config file settings.
I used Björn's command-line version (last updated 7 April, 2002)
I used these config settings last (tried many variations)
tidy-mark: no
doctype: omit
output-xml: yes
output-xhtml: yes
add-xml-decl: yes
write-back: yes
quiet: yes
show-warnings: no
wrap: 0
assume-xml-procins: yes
quote-nbsp: no
quote-marks: yes

hopefully I am not just being a bothersome newbe.

Thanks
Fred
Received on Friday, 12 July 2002 13:55:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:52 GMT