W3C home > Mailing lists > Public > whatwg@whatwg.org > October 2010

[whatwg] Timed tracks: feedback compendium

From: Philip Jägenstedt <philipj@opera.com>
Date: Fri, 22 Oct 2010 13:09:24 +0200
Message-ID: <op.vky49yqmsr6mfa@kirk>
On Fri, 22 Oct 2010 11:45:24 +0200, Simon Pieters <simonp at opera.com> wrote:

> On Fri, 22 Oct 2010 11:21:44 +0200, Silvia Pfeiffer  
> <silviapfeiffer1 at gmail.com> wrote:
>
>> Since the attributes in <track> are a hint, probably what is available
>> in the file should overrule what is in the <track> attributes. It is
>> the same for the @charset attribute, which is overruled to utf-8 for
>> WebSRT IIRC.
>
> No, charset="" overrules the encoding for WebSRT per spec.

We should just remove charset="" from the spec.

>>>> * add a means to add comments
>>>>
>>>> e.g.
>>>> // Lines starting with // are comments
>>>
>>> So far the web two comment syntaxes: <!-- SGML style --> and /* CSS  
>>> style
>>> */, so if we need comments I think we should pick one of these.
>
> Actually there are three more in javascript:
>
> // line comment
> <!-- line comment
> --> line comment
>
> http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments
>
>
>> I'm not fussed. I thought your analysis pointed to //, which is also
>> nicer because it takes the full line into account without a need for
>> end tags. Also, it is common from C++ and other programming languages.
>> But I don't really mind - we just need a decision and reasons for why.
>
> Using <!-- --> is a bad idea since the WebSRT syntax already uses -->. I  
> don't see the need for multiline comments.

Right. If we must have comments I think I'd prefer /* ... */ since both  
CSS and JavaScript have it, and I can't see that single-line comments will  
be easier from a parser perspective.

>>> Anyway, I agree that at least a magic header like "WebSRT" is needed  
>>> because
>>> of the horrors of legacy SRT parsing.
>
> I don't see why we can't just consume the legacy and support it in  
> WebSRT. Part of the point with WebSRT is to support the legacy. If we  
> don't want to support the legacy, then the format can be made a lot  
> cleaner.

Did you read  
<http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028799.html>  
and look at <http://ale5000.altervista.org/subtitles.htm>?

Do you think it's a good idea to make WebSRT an extension of ale5000-SRT?  
My opinion is that it's not a very good idea, which of course we can  
simplify some aspects of the format. For example, we don't need to allow  
both , and . as the millisecond separator, and the time parsing in general  
can be made more sane.

>>> Breaking SRT compat means that we can
>>> go back to requiring UTF-8 as the encoding. However, UTF-8 does  
>>> complicate
>>> the magic header a bit due to the possibility of a BOM [1]. While it  
>>> would
>>> be nice to forbid the use of a BOM, I expect we'd then see lots of
>>> frustration from authors who's editors automatically insert it...
>>>
>>> [1] http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
>>
>> I'm happy to enforce UTF-8 on WebSRT. The @charset can work for other
>> formats. I didn't know about the BOM problem - but having read it, I
>> would think it makes sense to forbid it. What tools do and how they
>> deal with erroneous files is a different matter.
>
> Forbidding it would be the frustration. Consider editing a WebSRT file  
> in Notepad, and then suddenly it doesn't work anymore. Instead we should  
> allow the BOM. (WebSRT already allows the BOM.)

This means that it's tricker to use "WebSRT" as the magic bytes, but I  
agree it's probably the better trade-off.

-- 
Philip J?genstedt
Core Developer
Opera Software
Received on Friday, 22 October 2010 04:09:24 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:01 UTC