- From: Philip Jägenstedt <philipj@opera.com>
- Date: Wed, 25 Aug 2010 09:28:12 +0200
Here's the script used: http://pastebin.com/KhdsydzJ Input was determined to be valid UTF-8 if text.decode('utf-8') didn't raise an exception, same for ASCII. I haven't tried to analyze what other encodings were used. Philip On Tue, 24 Aug 2010 21:47:14 +0200, Kevin Marks <kevinmarks at gmail.com> wrote: > When you say 'invalid utf8' what were you seeing? win1252 encoding of > accents? or illegal unicode characters like 0x80 ? > > On Tue, Aug 24, 2010 at 4:20 AM, Philip J?genstedt > <philipj at opera.com>wrote: > >> As mentioned deep in another thread, I've gotten hold of a big batch of >> SRT >> files and have collected some statistics, which may help inform >> decisions on >> the WebSRT format. Many thanks to OpenSubtitles for providing the data. >> >> http://blog.foolip.org/2010/08/20/srt-research/ >> >> -- >> Philip J?genstedt
Received on Wednesday, 25 August 2010 00:28:12 UTC