W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] SRT research

From: Philip Jägenstedt <philipj@opera.com>
Date: Wed, 25 Aug 2010 09:28:12 +0200
Message-ID: <op.vhzgdae7sr6mfa@nog>
Here's the script used: http://pastebin.com/KhdsydzJ

Input was determined to be valid UTF-8 if text.decode('utf-8') didn't  
raise an exception, same for ASCII. I haven't tried to analyze what other  
encodings were used.

Philip

On Tue, 24 Aug 2010 21:47:14 +0200, Kevin Marks <kevinmarks at gmail.com>  
wrote:

> When you say 'invalid utf8' what were you seeing? win1252 encoding of
> accents? or illegal unicode characters like 0x80 ?
>
> On Tue, Aug 24, 2010 at 4:20 AM, Philip J?genstedt  
> <philipj at opera.com>wrote:
>
>> As mentioned deep in another thread, I've gotten hold of a big batch of  
>> SRT
>> files and have collected some statistics, which may help inform  
>> decisions on
>> the WebSRT format. Many thanks to OpenSubtitles for providing the data.
>>
>> http://blog.foolip.org/2010/08/20/srt-research/
>>
>> --
>> Philip J?genstedt
Received on Wednesday, 25 August 2010 00:28:12 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:26 UTC