W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2002

Re: special characters in comments getting 'mangled'

From: Dave Raggett <dsr@w3.org>
Date: Fri, 11 Oct 2002 19:03:23 +0100 (BST)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
cc: html-tidy@w3.org
Message-ID: <Pine.LNX.4.44.0210111901120.1678-100000@hazel>

On Fri, 11 Oct 2002, Bjoern Hoehrmann wrote:

> * Dave Raggett wrote:
> >> >I have run into a problem while tidying up some html that has
> >> >comments in it.  Maybe this can turn into a requested feature??
> >> >
> >> >the comment looks something like
> >> ><!-- <o:tag>Coulomb's law</o:tag> -->
> >> >Except that the ' is a chr 146.  In other words, a 'smart apostropy' or
> >> >'curly apostropy'
> >> >(yes this is output from word if you are curious)
> >> >
> >> >This character is getting changed to something else. In my text editor it
> >> >indicates it is a chr 25.
> >>
> >> Yes. That's a bug. I added it to the bug tracker, see
> >>
> >>   http://tidy.sf.net/issue/621671
> >
> >Is it a bug?
> 
> To convert 0x92 to 0x19 is a bug, yes.

Why?  If you are converting a broken document (invalid characters)
into a valid document with the equivalent Unicode characters and
a Unicode character set, surely this is in direct alignment with
the goals of HTML Tidy?

If the OUTPUT charset is the Windows charset, then yes it would
be a bug to convert these characters to the Unicode equivalents.

-- 
 Dave Raggett <dsr@w3.org> or <dave.raggett@openwave.com>
 W3C lead for voice/multimodal. http://www.w3.org/People/Raggett 
 tel/fax: +44 1225 866240 (or 867351) +44 771 213 7629 (GSM)
Received on Friday, 11 October 2002 14:04:45 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:48 UTC