W3C home > Mailing lists > Public > public-html@w3.org > June 2007

Re: Seeking test data with bogus byte sequences from Henri Sivonen on 2007-06-19 (public-html@w3.org from June 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 25 Jun 2007 08:45:58 +0900
Message-Id: <6.0.0.20.2.20070625084412.023f6ec0@localhost>
To: Karl Dubost <karl@w3.org>, Richard Ishida <ishida@w3.org>, Felix Sasaki <fsasaki@w3.org>
Cc: www-international@w3.org, public-html@w3.org

Hello Karl,

For UTF-8, I would suggest starting with
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
(Sections 3-5).

Regards,   Martin.

At 15:09 07/06/19, Karl Dubost wrote:
>
>Richard, Felix,
>
>do you have this handy?
>Could you reply on this thread on the public-html mailing-list?
>
>Many thanks,
>
>[[[
>I could use test documents that are otherwise small conforming HTML5  
>documents in encoding where a character may take more than one byte  
>(with the encoding declared using the BOM or <meta charset='...'>)  
>except that they contain a byte sequence that is bogus for the  
>declared encoding: non-shortest-form UTF-8, unpaired surrogates in  
>UTF-16, broken Shift_JIS with the kind of brokenness you could get in  
>Shift_JIS (I don't know what exactly I should be testing with non-UTF  
>encodings). If someone already has this kind of test data, please let  
>me know. Thanks.
>]]]-- Seeking test data with bogus byte sequences from Henri Sivonen  
>on 2007-06-19 (public-html@w3.org from June 2007)
>http://lists.w3.org/Archives/Public/public-html/2007Jun/0402.html
>Tue, 19 Jun 2007 06:07:22 GMT
>
>
>-- 
>Karl Dubost - http://www.w3.org/People/karl/
>W3C Conformance Manager, QA Activity Lead
>   QA Weblog - http://www.w3.org/QA/
>      *** Be Strict To Be Cool ***
>
>
>
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 25 June 2007 00:21:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:01 GMT