W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 26 Aug 2010 01:40:08 +1000
Message-ID: <AANLkTi=yQanFNMan_BathcmBX_vysAL5euXxf+q-T87x@mail.gmail.com>
On Thu, Aug 26, 2010 at 12:39 AM, Philip J?genstedt <philipj at opera.com>wrote:

> On Wed, 25 Aug 2010 14:39:00 +0200, Silvia Pfeiffer <
> silviapfeiffer1 at gmail.com> wrote:
>
>  On Wed, Aug 25, 2010 at 7:20 PM, Philip J?genstedt <philipj at opera.com
>> >wrote:
>>
>>
>  The question, then, is if parsers that handle the mentioned markup also
>>>>> ignore <1>, <ruby> and <rt>. I haven't tested it, but I assume that
>>>>> some
>>>>> will ignore it and some won't. How many percent of the media player
>>>>> market
>>>>> would have to handle this correctly for these extensions to be OK, in
>>>>> your
>>>>> opinion?
>>>>>
>>>>>
>>>>
>>>> If a single one breaks, it would be bad IMO because the expectations of
>>>> the
>>>> users of that software will be broken even if it may just be a small
>>>> percentage of users and we have no influence on the upgrade path of that
>>>> software - in particular if it is proprietary.
>>>>
>>>>
>>> Neither a new file extension, MIME type or header is enough to stop some
>>> implementations from treating it as SRT and break. The only remaining
>>> option, AFAICT, is making the format fundamentally incompatible with SRT.
>>> Is
>>> it worth it?
>>>
>>
>>
>> If it has a different file extension and a different mime type and even a
>> different header, I don't think any existing software will open it as SRT.
>> Why would it think that a random file is a SRT file? It would need to be
>> an
>> application that accepts absolutely anything that you give it as SRT and
>> then that software has more fundamental problems.
>>
>
> I renamed a SRT file to .wsrt and added WEBSRT on a line before the cues
> and it still plays just fine in MPlayer, using `mplayer video.ogv -sub
> subs.wsrt`.



I wouldn't count command-line applications for this - you can always throw
just about anything at a command-line application and that is good and an
advantage, because it may just work, as it did here. But it is a controlled
environment by somebody who knows what they are doing - it is unlikely to
cause problems and confusion.



> VLC won't open a subtitle file with .wsrt extension, but the same file
> (with a WEBSRT header) works with the extension srt or txt.


Again - that's a good thing and exactly what I would prefer. If you know
what you are doing and you know your file is probably just going to work,
you can consciously decide to fall back to SRT.



> Totem is the other way around, the file extension doesn't matter, but it
> rejects files with a header.
>

That's just proof that it's a different file format.



> The results are hardly consistent, but at least one player exist for which
> it's not enough to change the file extension and add a header. If we want to
> make sure that no content is treated as SRT by any application, the format
> must be more incompatible.


You misunderstand my intent. I am by no means suggesting that no WebSRT
content is treated as SRT by any application. All I am asking for is a
different file extension and a different mime type and possibly a magic
identifier such that *authoring* applications (and authors) can clearly
designate this to be a different format, in particular if they include new
features. Then a *playback application* has the chance to identify them as a
different format and provide a specific parser for it, instead of failing
like Totem. They can also decide to extend their existing SRT parser to
support both WebSRT and SRT. And I also have no issue with a user deciding
to give a WebSRT file a go by renaming it to .srt.

By keeping WebSRT and SRT as different formats we give the applications a
choice to support either, or both in the same parser. If we don't, we force
them to deal in a single parser with all the oddities of SRT formats as well
as all the extra features and all the extensibility of WebSRT.



>
>  At this point, what is your recommendation? The following ideas have been
>>>>> on the table:
>>>>>
>>>>> * Change the file extension to something other than .srt.
>>>>>
>>>>> I don't have an opinion, browsers ignore the file extension anyway.
>>>>>
>>>>>
>>>>>  Yes, I think we should definitely have a new file extension.
>>>>
>>>>
>>> I'll leave this to others to decide, but since browsers have no concept
>>> of
>>> file extensions, just using .srt will work. If the format is SRT-like
>>> it's
>>> likely at least some files will use .srt in practice.
>>>
>>
>>
>> All SRT files in practice use the .srt extension - it is typically how
>> these
>> formats are identified by applications. Just because *nix ignores file
>> extensions mostly for identifying file types doesn't mean that
>> applications
>> do. Again, I believe strongly that re-using the same file extension is the
>> one biggest pain we can inflict on the community.
>>
>
> As shown above, several popular (?) media players ignore or give little
> weight to the file extension.


I don't think that's a fair sample - as I said, on Linux and on the
command-line things are different. I have a GUI mplayer here and it reacts
like VLC - doesn't let me open .wsrt files. The vast majority of
applications on Windows and the Mac make their decision on whether they
support files based on the file extension.

Assuming we pick the same file extension and we now have a new application
that only supports WebSRT parsing, we will make a large bunch of existing
valid SRT files invalid - not only those that are not in UTF-8, but also
those with <font>..</font> and <u>...</u>. I do wonder if the text between
the <font> start and end element and inside the <u>..</u> may even get
removed because of lack of support for these.



>
>   * Change the MIME type to something other than text/srt.
>>>
>>>>
>>>>> I doubt it makes any difference, as most software that deal with SRT
>>>>> today
>>>>> have no concept of MIME types. No matter what I'd want exactly 1 MIME
>>>>> type
>>>>> or alternatively make browsers ignore the MIME type completely.
>>>>>
>>>>>
>>>>>  You're right in that existing SRT software probably doesn't deal much
>>>> with
>>>> a
>>>> SRT mime type. Right now text/x-srt or text/srt is sometimes used for
>>>> SRT
>>>> files, but often text/plain is also in use and more likely from a Web
>>>> server. Since this is the space where Web browsers play, I am not overly
>>>> fussed, though I think logically text/websrt makes more sense with a
>>>> .wsrt
>>>> extension. Then, also SRT files can be served as text/websrt to allow
>>>> them
>>>> to take part in the WebSRT infrastructure if indeed they will continue
>>>> to
>>>> be
>>>> valid WebSRT files.
>>>>
>>>>
>>> Is there anything you expect would break if WebSRT files were served as
>>> text/srt?
>>>
>>
>>
>> I'm asking because I don't know how anal Web browsers are about mime
>> types.
>> I would think a Web browser should accept WebSRT and SRT files in
>> text/plain
>> format as well as WebSRT files in text/websrt format and SRT files in
>> text/srt format. Would something break if they even came as text/html? I
>> would expect that it makes a difference when these are loaded directly as
>> a
>> resource for display (e.g. when you directly go to
>> http://example.com/mycaptions.wsrt), but not when used through a <track>
>> element, where WebSRT is the baseline format and thus is expected.
>>
>
> It's actually easier for a browser to ignore the MIME type than it is to be
> strict about it, at least when the format is easily identified by sniffing
> (sniffing code is needed anyway for local files). WebSRT isn't very easy to
> sniff, so that would be an argument in favor of a mandatory magic header.
>
> The main reason to care about the MIME type is some kind of "doing the
> right thing" by not letting people get away with misconfigured servers.
> Sometimes I feel it's just a waste of everyone's time though, it would
> generally be less work for both browsers and authors to not bother.


That is happening with video right now. Eric from Apple just shared a nice
accessibility video, but because he has no access to the setup of the server
at apple.com, he cannot reconfigure it to support the video/ogg mime type,
with the result that in Firefox I cannot watch it even though there is a Ogg
file present.

I think if we can let people get away with serving text/plain on a WebSRT
file, but actually requesting text/websrt be used, that would be the best.



>   * Add a header to WebSRT to make it uniquely identifiable.
>>>
>>>>
>>>>> The header would have to be mandatory and browsers would have to reject
>>>>> files that don't have it. Such files would be compatible with some
>>>>> existing
>>>>> software and break some, depending on how they sniff. We could also put
>>>>> metadata in such a header.
>>>>>
>>>>>
>>>>>  Yes, I think we need to introduce a header. Maybe we can hide all the
>>>> structure in what SRT recognizes as comments (i.e. start the lines as
>>>> ";".
>>>> But I believe we need some hints like the @profile to identify the type
>>>> of
>>>> the cues and the <link> to link to a style sheet, and we need metadata
>>>> like
>>>> the <meta> element of HTML headers.
>>>>
>>>>
>>> I had no idea that semicolon was used for comments in SRT, is this usage
>>> widespread? Does it work in most players?
>>>
>>
>>
>> I thought it was, but maybe it was just introduced for WebSRT. It is not
>> tested in Hixie's SRT research[2]. Can you take a quick look through your
>> SRT file collection if there are any? I'm probably wrong about this seeing
>> as it's not mentioned in the wiki page for SRT [3].
>>
>> [2] http://wiki.whatwg.org/wiki/SRT_research
>> [3] http://en.wikipedia.org/wiki/SubRip
>>
>
> OK, I grepped the 10000 files. Only 15 had any lines beginning with a
> semicolon, and by manual inspection it doesn't look like any of them are
> clearly intended as comments (it's hard to tell, all are in foreign
> languages). None of them were at the very beginning of the file.


Ah, that actually makes for another incompatibility of WebSRT and SRT: such
lines are regarded as comments in WebSRT when they probably aren't in SRT.
It seems increasingly that the only thing that WebSRT and SRT still have in
common is the "-->" character sequence. As a friend of mine in a11y recently
said: "I was hoping to never have to stare at "-->" ever again... We could
indeed go all the way and define an much more different format, though I
don't think it will create implementations as quickly as a SRT-based but
changed format.



> I think we've made some interesting finds in this thread, but we're
> starting to go in circles by now. Perhaps we should give it a rest until we
> get input from a third party. A medal to anyone who has followed it this far
> :)


I was surprised how much new stuff we have added in the last 2 days. But I
agree - we are starting to go in circles. :)

Cheers,
Silvia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100826/9900206e/attachment-0001.htm>
Received on Wednesday, 25 August 2010 08:40:08 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:00 UTC