W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] Fwd: Discussing WebSRT and alternatives/improvements

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 25 Aug 2010 22:39:00 +1000
Message-ID: <AANLkTi=dv32PvUj4xAEF=sCcjcrHXeMUz=7k1TkFfqRN@mail.gmail.com>
On Wed, Aug 25, 2010 at 7:20 PM, Philip J?genstedt <philipj at opera.com>wrote:

> On Wed, 25 Aug 2010 09:16:56 +0200, Silvia Pfeiffer <
> silviapfeiffer1 at gmail.com> wrote:
>
>  On Tue, Aug 24, 2010 at 8:49 PM, Philip J?genstedt <philipj at opera.com
>> >wrote:
>>
>>  On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer <
>>> silviapfeiffer1 at gmail.com> wrote:
>>>
>>>  On Mon, Aug 23, 2010 at 6:55 PM, Philip J?genstedt <philipj at opera.com
>>>
>>>> >wrote:
>>>>
>>>>  Aside: WebSRT can't contain binary data, only UTF-8 encoded text.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>> It sure can. Just base-64 encode it. I'm not saying it's a good thing,
>>>>>> but
>>>>>> if somebody really has an urge...
>>>>>>
>>>>>>
>>>>>>  Sure, this would be a metadata track. Sites have no reason to offer
>>>>> download links to it, and if anyone gets hold of such a file it would
>>>>> quickly be evident that it's useless.
>>>>>
>>>>>
>>>>
>>>> After a user has seen the crap on screen. I'm just saying: it's a legal
>>>> WebSRT file and really not compatible with any existing infrastructure
>>>> for
>>>> SRT.
>>>>
>>>>
>>> A fair point. The alternatives I can see are (1) using an incompatible
>>> format so that the user sees nothing or (2) adding a header that
>>> indicates
>>> that the track is metadata.
>>>
>>> In order to tell the user to stop wasting their time with this file, I
>>> think (1) is clearly worse. (2) is absolutely an option, but it will only
>>> make a difference to software that understands this header and if the
>>> header
>>> is optional it will likely often be omitted. A dialog saying "this is a
>>> metadata track, you can't watch it" is slightly friendlier than a screen
>>> full of crap, but they are both pretty effective at getting the message
>>> across.
>>>
>>
>>
>>
>> Yeah, I'm totally for adding a hint as to what format is in the cue. Then,
>> a
>> WebSRT file can be identified as to what it contains.
>>
>
> OK, but note that a browser would ignore this and trust what <track kind>
> says. I wouldn't want the kind change after the external track is loaded, it
> would make the UI confusing if a captions track disappeared from the menu as
> soon as it was loaded because it internally claims to be metadata.


Yes, I have no problem with that. Though I believe we have overloaded @kind
with too much meaning as I already mentioned earlier [1]. I think it would
make more sense to pull the different dimensions into different attributes:
- @type or @format for the format of the cue
- @kind for the semantic meaning of it (subtitle, caption, karaoke etc) -
one track could even satisfy several needs, so this would be a lit of kinds
- and finally the visual rendering problem, which could possibly be solved
by providing a link to a div or p where the data should be rendered
alternatively to the default. Right now, audio and metadata tracks get no
rendering at all and I see that as a problem.


[1]
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027356.html



>
>  The question, then, is if parsers that handle the mentioned markup also
>>> ignore <1>, <ruby> and <rt>. I haven't tested it, but I assume that some
>>> will ignore it and some won't. How many percent of the media player
>>> market
>>> would have to handle this correctly for these extensions to be OK, in
>>> your
>>> opinion?
>>>
>>
>>
>> If a single one breaks, it would be bad IMO because the expectations of
>> the
>> users of that software will be broken even if it may just be a small
>> percentage of users and we have no influence on the upgrade path of that
>> software - in particular if it is proprietary.
>>
>
> Neither a new file extension, MIME type or header is enough to stop some
> implementations from treating it as SRT and break. The only remaining
> option, AFAICT, is making the format fundamentally incompatible with SRT. Is
> it worth it?


If it has a different file extension and a different mime type and even a
different header, I don't think any existing software will open it as SRT.
Why would it think that a random file is a SRT file? It would need to be an
application that accepts absolutely anything that you give it as SRT and
then that software has more fundamental problems.


>
>>> At this point, what is your recommendation? The following ideas have been
>>> on the table:
>>>
>>> * Change the file extension to something other than .srt.
>>>
>>> I don't have an opinion, browsers ignore the file extension anyway.
>>>
>>>
>> Yes, I think we should definitely have a new file extension.
>>
>
> I'll leave this to others to decide, but since browsers have no concept of
> file extensions, just using .srt will work. If the format is SRT-like it's
> likely at least some files will use .srt in practice.


All SRT files in practice use the .srt extension - it is typically how these
formats are identified by applications. Just because *nix ignores file
extensions mostly for identifying file types doesn't mean that applications
do. Again, I believe strongly that re-using the same file extension is the
one biggest pain we can inflict on the community.



>
>  * Change the MIME type to something other than text/srt.
>>>
>>> I doubt it makes any difference, as most software that deal with SRT
>>> today
>>> have no concept of MIME types. No matter what I'd want exactly 1 MIME
>>> type
>>> or alternatively make browsers ignore the MIME type completely.
>>>
>>>
>> You're right in that existing SRT software probably doesn't deal much with
>> a
>> SRT mime type. Right now text/x-srt or text/srt is sometimes used for SRT
>> files, but often text/plain is also in use and more likely from a Web
>> server. Since this is the space where Web browsers play, I am not overly
>> fussed, though I think logically text/websrt makes more sense with a .wsrt
>> extension. Then, also SRT files can be served as text/websrt to allow them
>> to take part in the WebSRT infrastructure if indeed they will continue to
>> be
>> valid WebSRT files.
>>
>
> Is there anything you expect would break if WebSRT files were served as
> text/srt?


I'm asking because I don't know how anal Web browsers are about mime types.
I would think a Web browser should accept WebSRT and SRT files in text/plain
format as well as WebSRT files in text/websrt format and SRT files in
text/srt format. Would something break if they even came as text/html? I
would expect that it makes a difference when these are loaded directly as a
resource for display (e.g. when you directly go to
http://example.com/mycaptions.wsrt), but not when used through a <track>
element, where WebSRT is the baseline format and thus is expected.


>
>  * Add a header to WebSRT to make it uniquely identifiable.
>>>
>>> The header would have to be mandatory and browsers would have to reject
>>> files that don't have it. Such files would be compatible with some
>>> existing
>>> software and break some, depending on how they sniff. We could also put
>>> metadata in such a header.
>>>
>>>
>> Yes, I think we need to introduce a header. Maybe we can hide all the
>> structure in what SRT recognizes as comments (i.e. start the lines as ";".
>> But I believe we need some hints like the @profile to identify the type of
>> the cues and the <link> to link to a style sheet, and we need metadata
>> like
>> the <meta> element of HTML headers.
>>
>
> I had no idea that semicolon was used for comments in SRT, is this usage
> widespread? Does it work in most players?


I thought it was, but maybe it was just introduced for WebSRT. It is not
tested in Hixie's SRT research[2]. Can you take a quick look through your
SRT file collection if there are any? I'm probably wrong about this seeing
as it's not mentioned in the wiki page for SRT [3].

[2] http://wiki.whatwg.org/wiki/SRT_research
[3] http://en.wikipedia.org/wiki/SubRip



>  * Make something deliberately incompatible with SRT.
>>
>>>
>>> It doesn't make a big difference to browsers implementing the format.
>>> We'd
>>> be replacing something that mostly works in existing players with
>>> something
>>> that never works.
>>>
>>>
>> That was the idea of WMML and I took that path because I thought it would
>> be
>> advantageous for other Web applications, such as built on libxml2, expat,
>> php's SimpleXML, pyexpat for python, Nokogiri for ruby etc. But I really
>> like the idea of WebSRT to allow arbitrary metadata in the cues without
>> having to put it into CDATA sections.
>>
>> I don't mind creating a format that is still somewhat compatible with SRT.
>> We don't have to force incompatibility - but we should also not have it
>> restrict us. In either case, it is a new format.
>>
>
> I'm not trying to be annoying, but this seems to clash with your preference
> to not break any existing software. Anything that resembles SRT *will* be
> treated as SRT in some existing players.


No, I think that's a misconception. I think most players test the file
extension and maybe the mime type before opening a file as srt. A quick test
in VLC on my Mac shows that when I go to "Subtitle -> Open File" I am not
allowed to open anything that doesn't have an extension that VLC accepts -
they get filtered out. Thus, what the actual file looks like really doesn't
matter - what matter is what it sells itself as through the file extension,
the mime type, or some magic identifier at the beginning of the file. Which
is used depends on your OS and your application.


Cheers,
Silvia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100825/168a78ec/attachment.htm>
Received on Wednesday, 25 August 2010 05:39:00 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:00 UTC