W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 24 Aug 2010 12:49:58 +0200
Message-ID: <op.vhxu1ko7sr6mfa@philip-pc.gothenburg.osa>
On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> On Mon, Aug 23, 2010 at 6:55 PM, Philip J?genstedt  
> <philipj at opera.com>wrote:
>
>>  Aside: WebSRT can't contain binary data, only UTF-8 encoded text.
>>>>
>>>
>>>
>>> It sure can. Just base-64 encode it. I'm not saying it's a good thing,  
>>> but
>>> if somebody really has an urge...
>>>
>>
>> Sure, this would be a metadata track. Sites have no reason to offer
>> download links to it, and if anyone gets hold of such a file it would
>> quickly be evident that it's useless.
>
>
> After a user has seen the crap on screen. I'm just saying: it's a legal
> WebSRT file and really not compatible with any existing infrastructure  
> for
> SRT.

A fair point. The alternatives I can see are (1) using an incompatible  
format so that the user sees nothing or (2) adding a header that indicates  
that the track is metadata.

In order to tell the user to stop wasting their time with this file, I  
think (1) is clearly worse. (2) is absolutely an option, but it will only  
make a difference to software that understands this header and if the  
header is optional it will likely often be omitted. A dialog saying "this  
is a metadata track, you can't watch it" is slightly friendlier than a  
screen full of crap, but they are both pretty effective at getting the  
message across.

>>  If we define WebSRT in a way that can handle >99% of existing content  
>> and
>>>> degrade gracefully (enough) when using new features in old software,  
>>>> it
>>>> seems reasonable to do. If lots of software developers cry foul, then
>>>> perhaps we should reconsider. It seems to me, though, that actually
>>>> researching and defining a good algorithm for parsing SRT would be of  
>>>> use
>>>> to
>>>> others than just browsers.
>>>>
>>>>
>>> How is that different from moving away from SRT. If everyone has to  
>>> change
>>> their parsing of SRT to accommodate a new spec, then that is a new  
>>> format.
>>>
>>
>> Not everyone has to change their parsers immediately, many will  
>> continue to
>> work. However, if someone wants to support SRT in a compatible way, it's
>> very helpful to have a spec, assuming that WebSRT is actually compatible
>> enough with existing SRT content.
>>
>> This is quite similar to HTML4 vs HTML5. There are lots of mostly
>> compatible HTML parsers, but HTML5 defines a single parsing algorithm,  
>> and
>> slow convergence towards that is a good thing.
>>
>
> No, no, no! It is not at all similar to HTML4 and HTML5. A Web browser
> cannot suddenly stop working for a Web page, just because it has some  
> extra
> functionality in it. Thus, the HTML format has been developed such that  
> it
> can be extended without breaking existing stuff. We can guarantee that no
> browser will break because that is the way in which the format has been
> specified.
>
> No such thing has happened for SRT and there is simply no way to  
> guarantee
> that all new WebSRT files will work in all existing SRT software, because
> SRT has not been specified as a extensible format and because there is no
> agreement between all parties that have implemented SRT support as to how
> extensions should be made.
>
> We can introduce such a thing for WebSRT, but we cannot claim it for SRT.

You are right, existing SRT parsers are probably far less interoperable  
than HTML parsers were before HTML5.

Existing content demands that SRT parsers handle at least <i>, <b>, <font>  
and <u> in some manner, even if it is by ignoring it. Any parsers that  
treat SRT as plain text don't even work with todays content, so I don't  
think they should be considered at all. The question, then, is if parsers  
that handle the mentioned markup also ignore <1>, <ruby> and <rt>. I  
haven't tested it, but I assume that some will ignore it and some won't.  
How many percent of the media player market would have to handle this  
correctly for these extensions to be OK, in your opinion?

>> If the SRT ecosystem is so fragile that it cannot tolerate any extension
>> whatsoever, then we should stay far away from it. It just seems that's  
>> not
>> the case.
>
>
> How do we know that everyone that uses SRT now really wants to use WebSRT
> instead and wants to take part in the new ecosystem that we are  
> introducing?
> We make some pretty big assumptions about what everyone who is not a Web
> browser vendor wants to do with SRT. That doesn't make the existing SRT
> ecosystem fragile - but it makes it an existing environment that needs  
> to be
> respected.

At this point, what is your recommendation? The following ideas have been  
on the table:

* Change the file extension to something other than .srt.

I don't have an opinion, browsers ignore the file extension anyway.

* Change the MIME type to something other than text/srt.

I doubt it makes any difference, as most software that deal with SRT today  
have no concept of MIME types. No matter what I'd want exactly 1 MIME type  
or alternatively make browsers ignore the MIME type completely.

* Add a header to WebSRT to make it uniquely identifiable.

The header would have to be mandatory and browsers would have to reject  
files that don't have it. Such files would be compatible with some  
existing software and break some, depending on how they sniff. We could  
also put metadata in such a header.

* Make something deliberately incompatible with SRT.

It doesn't make a big difference to browsers implementing the format. We'd  
be replacing something that mostly works in existing players with  
something that never works.



Here's the SRT research I promised:  
http://blog.foolip.org/2010/08/20/srt-research/

-- 
Philip J?genstedt
Core Developer
Opera Software
Received on Tuesday, 24 August 2010 03:49:58 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:00 UTC