W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: Unicode conference papers

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 22 Nov 2006 15:10:07 +0900
Message-Id: <6.0.0.20.2.20061122145934.0b4fdb10@localhost>
To: John Cowan <cowan@ccil.org>, Richard Ishida <ishida@w3.org>
Cc: "'Mark Davis'" <mark.davis@icu-project.org>, "'Unicode'" <unicode@unicode.org>, www-international@w3.org

At 08:36 06/11/22, John Cowan wrote:
>
>Richard Ishida scripsit:
>
>> 2. what is doubly-encoded utf-8?
>
>Text encoded as UTF-8, then reinterpreted using an 8-bit encoding (often
>Latin-1 or Windows-1252), and then re-encoded incorrectly as UTF-8 for
>a second time.

Yes. The W3C site has quite a lot of these, too, even if they are
fortunately usually limited to single characters such as the copyright
sign. Here's an example:
http://www.w3.org/2001/Annotea/User/Papers.html

They are often the result of the download path and the upload path
being different in terms of how they handle character encoding information.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 22 November 2006 22:26:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT