[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

On Wed, 25 Aug 2010 14:39:00 +0200, Silvia Pfeiffer  
<silviapfeiffer1 at gmail.com> wrote:

> On Wed, Aug 25, 2010 at 7:20 PM, Philip J?genstedt  
> <philipj at opera.com>wrote:
>

>>>> The question, then, is if parsers that handle the mentioned markup  
>>>> also
>>>> ignore <1>, <ruby> and <rt>. I haven't tested it, but I assume that  
>>>> some
>>>> will ignore it and some won't. How many percent of the media player
>>>> market
>>>> would have to handle this correctly for these extensions to be OK, in
>>>> your
>>>> opinion?
>>>>
>>>
>>>
>>> If a single one breaks, it would be bad IMO because the expectations of
>>> the
>>> users of that software will be broken even if it may just be a small
>>> percentage of users and we have no influence on the upgrade path of  
>>> that
>>> software - in particular if it is proprietary.
>>>
>>
>> Neither a new file extension, MIME type or header is enough to stop some
>> implementations from treating it as SRT and break. The only remaining
>> option, AFAICT, is making the format fundamentally incompatible with  
>> SRT. Is
>> it worth it?
>
>
> If it has a different file extension and a different mime type and even a
> different header, I don't think any existing software will open it as  
> SRT.
> Why would it think that a random file is a SRT file? It would need to be  
> an
> application that accepts absolutely anything that you give it as SRT and
> then that software has more fundamental problems.

I renamed a SRT file to .wsrt and added WEBSRT on a line before the cues  
and it still plays just fine in MPlayer, using `mplayer video.ogv -sub  
subs.wsrt`. VLC won't open a subtitle file with .wsrt extension, but the  
same file (with a WEBSRT header) works with the extension srt or txt.  
Totem is the other way around, the file extension doesn't matter, but it  
rejects files with a header.

The results are hardly consistent, but at least one player exist for which  
it's not enough to change the file extension and add a header. If we want  
to make sure that no content is treated as SRT by any application, the  
format must be more incompatible.

>>>> At this point, what is your recommendation? The following ideas have  
>>>> been
>>>> on the table:
>>>>
>>>> * Change the file extension to something other than .srt.
>>>>
>>>> I don't have an opinion, browsers ignore the file extension anyway.
>>>>
>>>>
>>> Yes, I think we should definitely have a new file extension.
>>>
>>
>> I'll leave this to others to decide, but since browsers have no concept  
>> of
>> file extensions, just using .srt will work. If the format is SRT-like  
>> it's
>> likely at least some files will use .srt in practice.
>
>
> All SRT files in practice use the .srt extension - it is typically how  
> these
> formats are identified by applications. Just because *nix ignores file
> extensions mostly for identifying file types doesn't mean that  
> applications
> do. Again, I believe strongly that re-using the same file extension is  
> the
> one biggest pain we can inflict on the community.

As shown above, several popular (?) media players ignore or give little  
weight to the file extension.

>>  * Change the MIME type to something other than text/srt.
>>>>
>>>> I doubt it makes any difference, as most software that deal with SRT
>>>> today
>>>> have no concept of MIME types. No matter what I'd want exactly 1 MIME
>>>> type
>>>> or alternatively make browsers ignore the MIME type completely.
>>>>
>>>>
>>> You're right in that existing SRT software probably doesn't deal much  
>>> with
>>> a
>>> SRT mime type. Right now text/x-srt or text/srt is sometimes used for  
>>> SRT
>>> files, but often text/plain is also in use and more likely from a Web
>>> server. Since this is the space where Web browsers play, I am not  
>>> overly
>>> fussed, though I think logically text/websrt makes more sense with a  
>>> .wsrt
>>> extension. Then, also SRT files can be served as text/websrt to allow  
>>> them
>>> to take part in the WebSRT infrastructure if indeed they will continue  
>>> to
>>> be
>>> valid WebSRT files.
>>>
>>
>> Is there anything you expect would break if WebSRT files were served as
>> text/srt?
>
>
> I'm asking because I don't know how anal Web browsers are about mime  
> types.
> I would think a Web browser should accept WebSRT and SRT files in  
> text/plain
> format as well as WebSRT files in text/websrt format and SRT files in
> text/srt format. Would something break if they even came as text/html? I
> would expect that it makes a difference when these are loaded directly  
> as a
> resource for display (e.g. when you directly go to
> http://example.com/mycaptions.wsrt), but not when used through a <track>
> element, where WebSRT is the baseline format and thus is expected.

It's actually easier for a browser to ignore the MIME type than it is to  
be strict about it, at least when the format is easily identified by  
sniffing (sniffing code is needed anyway for local files). WebSRT isn't  
very easy to sniff, so that would be an argument in favor of a mandatory  
magic header.

The main reason to care about the MIME type is some kind of "doing the  
right thing" by not letting people get away with misconfigured servers.  
Sometimes I feel it's just a waste of everyone's time though, it would  
generally be less work for both browsers and authors to not bother.

>>  * Add a header to WebSRT to make it uniquely identifiable.
>>>>
>>>> The header would have to be mandatory and browsers would have to  
>>>> reject
>>>> files that don't have it. Such files would be compatible with some
>>>> existing
>>>> software and break some, depending on how they sniff. We could also  
>>>> put
>>>> metadata in such a header.
>>>>
>>>>
>>> Yes, I think we need to introduce a header. Maybe we can hide all the
>>> structure in what SRT recognizes as comments (i.e. start the lines as  
>>> ";".
>>> But I believe we need some hints like the @profile to identify the  
>>> type of
>>> the cues and the <link> to link to a style sheet, and we need metadata
>>> like
>>> the <meta> element of HTML headers.
>>>
>>
>> I had no idea that semicolon was used for comments in SRT, is this usage
>> widespread? Does it work in most players?
>
>
> I thought it was, but maybe it was just introduced for WebSRT. It is not
> tested in Hixie's SRT research[2]. Can you take a quick look through your
> SRT file collection if there are any? I'm probably wrong about this  
> seeing
> as it's not mentioned in the wiki page for SRT [3].
>
> [2] http://wiki.whatwg.org/wiki/SRT_research
> [3] http://en.wikipedia.org/wiki/SubRip

OK, I grepped the 10000 files. Only 15 had any lines beginning with a  
semicolon, and by manual inspection it doesn't look like any of them are  
clearly intended as comments (it's hard to tell, all are in foreign  
languages). None of them were at the very beginning of the file.

>>  * Make something deliberately incompatible with SRT.
>>>
>>>>
>>>> It doesn't make a big difference to browsers implementing the format.
>>>> We'd
>>>> be replacing something that mostly works in existing players with
>>>> something
>>>> that never works.
>>>>
>>>>
>>> That was the idea of WMML and I took that path because I thought it  
>>> would
>>> be
>>> advantageous for other Web applications, such as built on libxml2,  
>>> expat,
>>> php's SimpleXML, pyexpat for python, Nokogiri for ruby etc. But I  
>>> really
>>> like the idea of WebSRT to allow arbitrary metadata in the cues without
>>> having to put it into CDATA sections.
>>>
>>> I don't mind creating a format that is still somewhat compatible with  
>>> SRT.
>>> We don't have to force incompatibility - but we should also not have it
>>> restrict us. In either case, it is a new format.
>>>
>>
>> I'm not trying to be annoying, but this seems to clash with your  
>> preference
>> to not break any existing software. Anything that resembles SRT *will*  
>> be
>> treated as SRT in some existing players.
>
>
> No, I think that's a misconception. I think most players test the file
> extension and maybe the mime type before opening a file as srt. A quick  
> test
> in VLC on my Mac shows that when I go to "Subtitle -> Open File" I am not
> allowed to open anything that doesn't have an extension that VLC accepts  
> -
> they get filtered out. Thus, what the actual file looks like really  
> doesn't
> matter - what matter is what it sells itself as through the file  
> extension,
> the mime type, or some magic identifier at the beginning of the file.  
> Which
> is used depends on your OS and your application.

Some application use the file extension, some rely on sniffing and at  
least GStreamer (and thus Totem) weighs both of these together.

I think we've made some interesting finds in this thread, but we're  
starting to go in circles by now. Perhaps we should give it a rest until  
we get input from a third party. A medal to anyone who has followed it  
this far :)

-- 
Philip J?genstedt
Core Developer
Opera Software

Received on Wednesday, 25 August 2010 07:39:45 UTC