Re: Seeking test data with bogus byte sequences from Henri Sivonen on 2007-06-19 (public-html@w3.org from June 2007)

Hello Karl,

For UTF-8, I would suggest starting with
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
(Sections 3-5).

Regards,   Martin.

At 15:09 07/06/19, Karl Dubost wrote:
>
>Richard, Felix,
>
>do you have this handy?
>Could you reply on this thread on the public-html mailing-list?
>
>Many thanks,
>
>[[[
>I could use test documents that are otherwise small conforming HTML5  
>documents in encoding where a character may take more than one byte  
>(with the encoding declared using the BOM or <meta charset='...'>)  
>except that they contain a byte sequence that is bogus for the  
>declared encoding: non-shortest-form UTF-8, unpaired surrogates in  
>UTF-16, broken Shift_JIS with the kind of brokenness you could get in  
>Shift_JIS (I don't know what exactly I should be testing with non-UTF  
>encodings). If someone already has this kind of test data, please let  
>me know. Thanks.
>]]]-- Seeking test data with bogus byte sequences from Henri Sivonen  
>on 2007-06-19 (public-html@w3.org from June 2007)
>http://lists.w3.org/Archives/Public/public-html/2007Jun/0402.html
>Tue, 19 Jun 2007 06:07:22 GMT
>
>
>-- 
>Karl Dubost - http://www.w3.org/People/karl/
>W3C Conformance Manager, QA Activity Lead
>   QA Weblog - http://www.w3.org/QA/
>      *** Be Strict To Be Cool ***
>
>
>
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Monday, 25 June 2007 00:21:11 UTC