- From: Dave Raggett <dsr@w3.org>
- Date: Fri, 11 Oct 2002 19:03:23 +0100 (BST)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: html-tidy@w3.org
On Fri, 11 Oct 2002, Bjoern Hoehrmann wrote: > * Dave Raggett wrote: > >> >I have run into a problem while tidying up some html that has > >> >comments in it. Maybe this can turn into a requested feature?? > >> > > >> >the comment looks something like > >> ><!-- <o:tag>Coulomb's law</o:tag> --> > >> >Except that the ' is a chr 146. In other words, a 'smart apostropy' or > >> >'curly apostropy' > >> >(yes this is output from word if you are curious) > >> > > >> >This character is getting changed to something else. In my text editor it > >> >indicates it is a chr 25. > >> > >> Yes. That's a bug. I added it to the bug tracker, see > >> > >> http://tidy.sf.net/issue/621671 > > > >Is it a bug? > > To convert 0x92 to 0x19 is a bug, yes. Why? If you are converting a broken document (invalid characters) into a valid document with the equivalent Unicode characters and a Unicode character set, surely this is in direct alignment with the goals of HTML Tidy? If the OUTPUT charset is the Windows charset, then yes it would be a bug to convert these characters to the Unicode equivalents. -- Dave Raggett <dsr@w3.org> or <dave.raggett@openwave.com> W3C lead for voice/multimodal. http://www.w3.org/People/Raggett tel/fax: +44 1225 866240 (or 867351) +44 771 213 7629 (GSM)
Received on Friday, 11 October 2002 14:04:45 UTC