- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Wed, 25 Aug 2010 17:16:56 +1000
On Tue, Aug 24, 2010 at 8:49 PM, Philip J?genstedt <philipj at opera.com>wrote: > On Tue, 24 Aug 2010 04:32:21 +0200, Silvia Pfeiffer < > silviapfeiffer1 at gmail.com> wrote: > > On Mon, Aug 23, 2010 at 6:55 PM, Philip J?genstedt <philipj at opera.com >> >wrote: >> >> Aside: WebSRT can't contain binary data, only UTF-8 encoded text. >>> >>>> >>>>> >>>> >>>> It sure can. Just base-64 encode it. I'm not saying it's a good thing, >>>> but >>>> if somebody really has an urge... >>>> >>>> >>> Sure, this would be a metadata track. Sites have no reason to offer >>> download links to it, and if anyone gets hold of such a file it would >>> quickly be evident that it's useless. >>> >> >> >> After a user has seen the crap on screen. I'm just saying: it's a legal >> WebSRT file and really not compatible with any existing infrastructure for >> SRT. >> > > A fair point. The alternatives I can see are (1) using an incompatible > format so that the user sees nothing or (2) adding a header that indicates > that the track is metadata. > > In order to tell the user to stop wasting their time with this file, I > think (1) is clearly worse. (2) is absolutely an option, but it will only > make a difference to software that understands this header and if the header > is optional it will likely often be omitted. A dialog saying "this is a > metadata track, you can't watch it" is slightly friendlier than a screen > full of crap, but they are both pretty effective at getting the message > across. Yeah, I'm totally for adding a hint as to what format is in the cue. Then, a WebSRT file can be identified as to what it contains. > If we define WebSRT in a way that can handle >99% of existing content and >>> >>>> degrade gracefully (enough) when using new features in old software, it >>>>> seems reasonable to do. If lots of software developers cry foul, then >>>>> perhaps we should reconsider. It seems to me, though, that actually >>>>> researching and defining a good algorithm for parsing SRT would be of >>>>> use >>>>> to >>>>> others than just browsers. >>>>> >>>>> >>>>> How is that different from moving away from SRT. If everyone has to >>>> change >>>> their parsing of SRT to accommodate a new spec, then that is a new >>>> format. >>>> >>>> >>> Not everyone has to change their parsers immediately, many will continue >>> to >>> work. However, if someone wants to support SRT in a compatible way, it's >>> very helpful to have a spec, assuming that WebSRT is actually compatible >>> enough with existing SRT content. >>> >>> This is quite similar to HTML4 vs HTML5. There are lots of mostly >>> compatible HTML parsers, but HTML5 defines a single parsing algorithm, >>> and >>> slow convergence towards that is a good thing. >>> >>> >> No, no, no! It is not at all similar to HTML4 and HTML5. A Web browser >> cannot suddenly stop working for a Web page, just because it has some >> extra >> functionality in it. Thus, the HTML format has been developed such that it >> can be extended without breaking existing stuff. We can guarantee that no >> browser will break because that is the way in which the format has been >> specified. >> >> No such thing has happened for SRT and there is simply no way to guarantee >> that all new WebSRT files will work in all existing SRT software, because >> SRT has not been specified as a extensible format and because there is no >> agreement between all parties that have implemented SRT support as to how >> extensions should be made. >> >> We can introduce such a thing for WebSRT, but we cannot claim it for SRT. >> > > You are right, existing SRT parsers are probably far less interoperable > than HTML parsers were before HTML5. > > Existing content demands that SRT parsers handle at least <i>, <b>, <font> > and <u> in some manner, even if it is by ignoring it. Any parsers that treat > SRT as plain text don't even work with todays content, so I don't think they > should be considered at all. You've just defined what SRT is. I would actually define SRT as the plain text format and the <i>, <b>, <font> and <u> markup as extensions. > The question, then, is if parsers that handle the mentioned markup also > ignore <1>, <ruby> and <rt>. I haven't tested it, but I assume that some > will ignore it and some won't. How many percent of the media player market > would have to handle this correctly for these extensions to be OK, in your > opinion? If a single one breaks, it would be bad IMO because the expectations of the users of that software will be broken even if it may just be a small percentage of users and we have no influence on the upgrade path of that software - in particular if it is proprietary. > > If the SRT ecosystem is so fragile that it cannot tolerate any extension >>> whatsoever, then we should stay far away from it. It just seems that's >>> not >>> the case. >>> >> >> >> How do we know that everyone that uses SRT now really wants to use WebSRT >> instead and wants to take part in the new ecosystem that we are >> introducing? >> We make some pretty big assumptions about what everyone who is not a Web >> browser vendor wants to do with SRT. That doesn't make the existing SRT >> ecosystem fragile - but it makes it an existing environment that needs to >> be >> respected. >> > > At this point, what is your recommendation? The following ideas have been > on the table: > > * Change the file extension to something other than .srt. > > I don't have an opinion, browsers ignore the file extension anyway. > Yes, I think we should definitely have a new file extension. > * Change the MIME type to something other than text/srt. > > I doubt it makes any difference, as most software that deal with SRT today > have no concept of MIME types. No matter what I'd want exactly 1 MIME type > or alternatively make browsers ignore the MIME type completely. > You're right in that existing SRT software probably doesn't deal much with a SRT mime type. Right now text/x-srt or text/srt is sometimes used for SRT files, but often text/plain is also in use and more likely from a Web server. Since this is the space where Web browsers play, I am not overly fussed, though I think logically text/websrt makes more sense with a .wsrt extension. Then, also SRT files can be served as text/websrt to allow them to take part in the WebSRT infrastructure if indeed they will continue to be valid WebSRT files. Incidentally, it is a problem if WebSRT files are served as text/plain, i.e. will the browser not identify them as subtitle files? > * Add a header to WebSRT to make it uniquely identifiable. > > The header would have to be mandatory and browsers would have to reject > files that don't have it. Such files would be compatible with some existing > software and break some, depending on how they sniff. We could also put > metadata in such a header. > Yes, I think we need to introduce a header. Maybe we can hide all the structure in what SRT recognizes as comments (i.e. start the lines as ";". But I believe we need some hints like the @profile to identify the type of the cues and the <link> to link to a style sheet, and we need metadata like the <meta> element of HTML headers. * Make something deliberately incompatible with SRT. > > It doesn't make a big difference to browsers implementing the format. We'd > be replacing something that mostly works in existing players with something > that never works. > That was the idea of WMML and I took that path because I thought it would be advantageous for other Web applications, such as built on libxml2, expat, php's SimpleXML, pyexpat for python, Nokogiri for ruby etc. But I really like the idea of WebSRT to allow arbitrary metadata in the cues without having to put it into CDATA sections. I don't mind creating a format that is still somewhat compatible with SRT. We don't have to force incompatibility - but we should also not have it restrict us. In either case, it is a new format. > Here's the SRT research I promised: > http://blog.foolip.org/2010/08/20/srt-research/ That is awesome work. I knew that most SRT files didn't use UTF-8, but I didn't know that we would make such a large percentage of files that are currently parsed by SRT software be incompatible. It is good data to have. Cheers, Silvia. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100825/8c947ad4/attachment-0001.htm>
Received on Wednesday, 25 August 2010 00:16:56 UTC