- From: Philip Jägenstedt <philipj@opera.com>
- Date: Fri, 22 Oct 2010 13:09:24 +0200
On Fri, 22 Oct 2010 11:45:24 +0200, Simon Pieters <simonp at opera.com> wrote: > On Fri, 22 Oct 2010 11:21:44 +0200, Silvia Pfeiffer > <silviapfeiffer1 at gmail.com> wrote: > >> Since the attributes in <track> are a hint, probably what is available >> in the file should overrule what is in the <track> attributes. It is >> the same for the @charset attribute, which is overruled to utf-8 for >> WebSRT IIRC. > > No, charset="" overrules the encoding for WebSRT per spec. We should just remove charset="" from the spec. >>>> * add a means to add comments >>>> >>>> e.g. >>>> // Lines starting with // are comments >>> >>> So far the web two comment syntaxes: <!-- SGML style --> and /* CSS >>> style >>> */, so if we need comments I think we should pick one of these. > > Actually there are three more in javascript: > > // line comment > <!-- line comment > --> line comment > > http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments > > >> I'm not fussed. I thought your analysis pointed to //, which is also >> nicer because it takes the full line into account without a need for >> end tags. Also, it is common from C++ and other programming languages. >> But I don't really mind - we just need a decision and reasons for why. > > Using <!-- --> is a bad idea since the WebSRT syntax already uses -->. I > don't see the need for multiline comments. Right. If we must have comments I think I'd prefer /* ... */ since both CSS and JavaScript have it, and I can't see that single-line comments will be easier from a parser perspective. >>> Anyway, I agree that at least a magic header like "WebSRT" is needed >>> because >>> of the horrors of legacy SRT parsing. > > I don't see why we can't just consume the legacy and support it in > WebSRT. Part of the point with WebSRT is to support the legacy. If we > don't want to support the legacy, then the format can be made a lot > cleaner. Did you read <http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028799.html> and look at <http://ale5000.altervista.org/subtitles.htm>? Do you think it's a good idea to make WebSRT an extension of ale5000-SRT? My opinion is that it's not a very good idea, which of course we can simplify some aspects of the format. For example, we don't need to allow both , and . as the millisecond separator, and the time parsing in general can be made more sane. >>> Breaking SRT compat means that we can >>> go back to requiring UTF-8 as the encoding. However, UTF-8 does >>> complicate >>> the magic header a bit due to the possibility of a BOM [1]. While it >>> would >>> be nice to forbid the use of a BOM, I expect we'd then see lots of >>> frustration from authors who's editors automatically insert it... >>> >>> [1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 >> >> I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other >> formats. I didn't know about the BOM problem - but having read it, I >> would think it makes sense to forbid it. What tools do and how they >> deal with erroneous files is a different matter. > > Forbidding it would be the frustration. Consider editing a WebSRT file > in Notepad, and then suddenly it doesn't work anymore. Instead we should > allow the BOM. (WebSRT already allows the BOM.) This means that it's tricker to use "WebSRT" as the magic bytes, but I agree it's probably the better trade-off. -- Philip J?genstedt Core Developer Opera Software
Received on Friday, 22 October 2010 04:09:24 UTC