W3C home > Mailing lists > Public > public-html@w3.org > June 2007

Seeking test data with bogus byte sequences

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 19 Jun 2007 08:50:52 +0300
Message-Id: <180FF85C-036F-401C-A2CA-2AC634006A23@iki.fi>
To: HTML WG <public-html@w3.org>

I could use test documents that are otherwise small conforming HTML5  
documents in encoding where a character may take more than one byte  
(with the encoding declared using the BOM or <meta charset='...'>)  
except that they contain a byte sequence that is bogus for the  
declared encoding: non-shortest-form UTF-8, unpaired surrogates in  
UTF-16, broken Shift_JIS with the kind of brokenness you could get in  
Shift_JIS (I don't know what exactly I should be testing with non-UTF  
encodings).

If someone already has this kind of test data, please let me know.  
Thanks.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 19 June 2007 05:47:56 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:45 UTC