W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: Seeking test data with bogus byte sequences from Henri Sivonen on 2007-06-19 (public-html@w3.org from June 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 25 Jun 2007 08:45:58 +0900
Message-Id: <>
To: Karl Dubost <karl@w3.org>, Richard Ishida <ishida@w3.org>, Felix Sasaki <fsasaki@w3.org>
Cc: www-international@w3.org, public-html@w3.org

Hello Karl,

For UTF-8, I would suggest starting with
(Sections 3-5).

Regards,   Martin.

At 15:09 07/06/19, Karl Dubost wrote:
>Richard, Felix,
>do you have this handy?
>Could you reply on this thread on the public-html mailing-list?
>Many thanks,
>I could use test documents that are otherwise small conforming HTML5  
>documents in encoding where a character may take more than one byte  
>(with the encoding declared using the BOM or <meta charset='...'>)  
>except that they contain a byte sequence that is bogus for the  
>declared encoding: non-shortest-form UTF-8, unpaired surrogates in  
>UTF-16, broken Shift_JIS with the kind of brokenness you could get in  
>Shift_JIS (I don't know what exactly I should be testing with non-UTF  
>encodings). If someone already has this kind of test data, please let  
>me know. Thanks.
>]]]-- Seeking test data with bogus byte sequences from Henri Sivonen  
>on 2007-06-19 (public-html@w3.org from June 2007)
>Tue, 19 Jun 2007 06:07:22 GMT
>Karl Dubost - http://www.w3.org/People/karl/
>W3C Conformance Manager, QA Activity Lead
>   QA Weblog - http://www.w3.org/QA/
>      *** Be Strict To Be Cool ***

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 25 June 2007 00:21:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:28 UTC