Re: WebVTT feedback

While we can't do away with script detection, we can strongly encourage people to tag things properly.  Especially if we don't have time to specify uniform sniffing.  

On Jun 5, 2012, at 15:06 , Ian Hickson wrote:

> On Wed, 7 Dec 2011, Philip Jägenstedt wrote:
>> On Tue, 06 Dec 2011 01:38:14 +0100, Ian Hickson <ian@hixie.ch> wrote:
>>> On Sat, 3 Dec 2011, Philip Jägenstedt wrote:
>>>> 
>>>> We're going to be doing the same script detection heuristics that we 
>>>> do on web pages. Differentiating between simplified Chinese, 
>>>> traditional Chinese and Japanese isn't particularly hard.
>>> 
>>> Can we define these for interoperability, or are they proprietary? (I 
>>> don't imagine people writing their own small WebVTT implementations 
>>> are going to know how to do this if we don't have a spec.)
>> 
>> Yes, we'd love for script detection heuristics to be specified, both for 
>> HTML and WebVTT.
>> 
>> I'm not an expert on this, but basically what we do is traverse the 
>> input (as unicode points) and count the number of hits for different 
>> buckets of script families and see which wins. For separating simplified 
>> and traditional Chinese (and maybe Japanese) that have a lot of overlap, 
>> I believe we look for common characters that are unique for each script 
>> (like 国 or 國) and see which class of characters wins.
>> 
>> What would be the appropriate way to proceed?
> 
> I considered trying to spec this myself, but I don't have the bandwidth to 
> take on something that size at the moment. I think the best way to proceed 
> would be for someone to write a specification that defines the algorithm 
> that does the script detection, and then for me to update HTML and WebVTT 
> to plug into that algorithm.
> 
> -- 
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 8 June 2012 00:23:47 UTC