- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Jun 2007 08:45:58 +0900
- To: Karl Dubost <karl@w3.org>, Richard Ishida <ishida@w3.org>, Felix Sasaki <fsasaki@w3.org>
- Cc: www-international@w3.org, public-html@w3.org
Hello Karl, For UTF-8, I would suggest starting with http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt (Sections 3-5). Regards, Martin. At 15:09 07/06/19, Karl Dubost wrote: > >Richard, Felix, > >do you have this handy? >Could you reply on this thread on the public-html mailing-list? > >Many thanks, > >[[[ >I could use test documents that are otherwise small conforming HTML5 >documents in encoding where a character may take more than one byte >(with the encoding declared using the BOM or <meta charset='...'>) >except that they contain a byte sequence that is bogus for the >declared encoding: non-shortest-form UTF-8, unpaired surrogates in >UTF-16, broken Shift_JIS with the kind of brokenness you could get in >Shift_JIS (I don't know what exactly I should be testing with non-UTF >encodings). If someone already has this kind of test data, please let >me know. Thanks. >]]]-- Seeking test data with bogus byte sequences from Henri Sivonen >on 2007-06-19 (public-html@w3.org from June 2007) >http://lists.w3.org/Archives/Public/public-html/2007Jun/0402.html >Tue, 19 Jun 2007 06:07:22 GMT > > >-- >Karl Dubost - http://www.w3.org/People/karl/ >W3C Conformance Manager, QA Activity Lead > QA Weblog - http://www.w3.org/QA/ > *** Be Strict To Be Cool *** > > > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 25 June 2007 00:21:11 UTC