[whatwg] Fwd: Discussing WebSRT and alternatives/improvements from Silvia Pfeiffer on 2010-08-11 (public-whatwg-archive@w3.org from August 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 11 Aug 2010 23:38:32 +1000
Message-ID: <AANLkTinXZVNW_2dcnYfZRn5H6+forKgkFX2VX8kpgtt3@mail.gmail.com>
On Wed, Aug 11, 2010 at 10:30 PM, Philip J?genstedt <philipj at opera.com>wrote:

> On Wed, 11 Aug 2010 01:43:01 +0200, Silvia Pfeiffer <
> silviapfeiffer1 at gmail.com> wrote:
>
>  On Tue, Aug 10, 2010 at 7:49 PM, Philip J?genstedt <philipj at opera.com
>> >wrote:
>>
>> I have checked the parse spec and
>> http://www.whatwg.org/specs/web-apps/current-work/#tag-open-state indeed
>> implies that a tag starting with a number is a parse error. Both, the
>> timestamps and the voice markers thus seem problems when going with an
>> innerHTML parser. Is there a way to resolve this? I mean: I'd quite
>> happily
>> drop the voice markers for a <span @class> but I am not sure what to do
>> about the timestamps. We could do what I did in WMML and introduce a <t>
>> element with the timestamp as a @at attribute, but that is again more
>> verbose. We could also introduce an @at attribute in <span> which would
>> then
>> at least end up in the DOM and can be dealt with specially.
>>
>
> What should numerical voices be replaced with? Personally I'd much rather
> write <philip> and <silvia> to mark up a conversation between us two, as I
> think it'd be quite hard to keep track of the numbers if editing subtitles
> with many different speakers. However, going with that and using an HTML
> parser is quite a hack. Names like <mark> and <li> may already have special
> parsing rules or default CSS.
>

In HTML it is <span class="philip">..</span> and <span
class="silvia">...</span>. I don't see anything wrong with that. And it's
only marginally longer than <philip> ... </philip> and <silvia>...</silvia>.



> Going with HTML in the cues, we either have to drop voices and inner
> timestamps or invent new markup, as HTML can't express either. I don't think
> either of those are really good solutions, so right now I'm not convinced
> that reusing the innerHTML parser is a good way forward.


I don't see a need for the voices - they already have markup in HTML, see
above. But I do wonder about the timestamps. I'd much rather keep the
innerHTML parser if we can, but I don't know enough about how the timestamps
could be introduced in a non-breakable manner. Maybe with a data- attribute?
Maybe <span data-t="00:00:02.100">...</span>?



>
>   Think for example about the case where we had a requirement that a double
>>>
>>>> newline starts a new cue, but now we want to introduce a means where the
>>>> double newline is escaped and can be made part of a cue.
>>>>
>>>> Other formats keep track of their version, such as MS Word files. It is
>>>> to
>>>> be hoped that most new features can be introduced without breaking
>>>> backwards
>>>> compatibility and we can write the parsing requirements such that
>>>> certain
>>>> things will be ignored, but in and of itself, WebSRT doesn't provide for
>>>> this extensibility. Right now, there is for example extensibility with
>>>> the
>>>> "WebSRT settings parsing" (that's the stuff behind the timestamps) where
>>>> further "setting:value" settings can be introduced. But for example the
>>>> introduction of new "cue identifiers" (that's the <> marker at the start
>>>> of
>>>> a cue) would be difficult without a version string, since anything that
>>>> doesn't match the given list will just be parsed as cue-internal tag and
>>>> thus end up as part of the cue text where plain text parsing is used.
>>>>
>>>>
>>> The bug I filed suggested allowing arbitrary voices, to simplify the
>>> parser
>>> and to make future extensions possible. For a web format I think this is
>>> a
>>> better approach format than versioning. I haven't done a full review of
>>> the
>>> parser, but there are probably more places where it could be more
>>> forgiving
>>> so as to allow future tweaking.
>>>
>>
>> That's a good approach and will reduce the need for breaking
>> backwards-compatibility. In an xml-based format that need is 0, while with
>> a
>> text format where the structure is ad-hoc, that need can never be reduced
>> to
>> 0. That's what I am concerned about and that's why I think we need a
>> version
>> identifier. If we end up never using/changing the version identifier, the
>> better so. But I'd much rather we have it now and can identify what
>> specification a file adheres to than not being able to do so later.
>>
>
> Perhaps I'm too influenced by HTML and its failed attempts at versioning,
> but I think that if you want to know which version of a spec a document is
> written against, you can run it through a parser for each version. This
> doesn't tell you the author intent, but I'm not sure that's very interesting
> to know. If the author thinks it's important, perhaps it can be put in a
> comment in the header.


I was most concerned about non-backwards-compatible changes here, but let's
not repeat the discussion I had with Anne. Let's rather focus on making sure
we have some means of extending WebSRT in future, should the need arise.



>
>   On the other hand, keeping the same extension and (unregistered) MIME
>>> type
>>>
>>>> as SRT has plenty of benefits, such as immediately being able to use
>>>>> existing SRT files in browsers without changing their file extension or
>>>>> MIME
>>>>> type.
>>>>>
>>>>>
>>>>
>>>> There is no harm for browsers to accept both MIME types if they are sure
>>>> they can parse old srt as well as new websrt. But these two formats are
>>>> different enough that they should be given a different extension and
>>>> mime
>>>> type. I do not see a single advantage in stealing the MIME type of an
>>>> existing format for a new specification.
>>>>
>>>>
>>> But there's no spec for the old SRT, the only thing one could do is
>>> parser
>>> it with a WebSRT parser.
>>>
>>
>>
>> I can write that spec in an afternoon and register the mime type with
>> IANA.
>> That really isn't a problem. People have managed to write correct SRT
>> files
>> without having a spec, because it's so trivial. Creating a spec is just a
>> formality. For now, the wikipedia page really is sufficient.
>>
>
> Having a separate spec isn't really useful unless we expect people to
> implement it. Perhaps some new implementations would follow the spec, but
> browsers sure wouldn't implement two different parsers.


As I also said to Anne: I wouldn't want to implement a SRT parser. It would
and should just fall out as a side benefit from implementing WebSRT. It's
not important for the browsers to make a distinction between SRT and WebSRT,
but it is important to everyone else who is trying to manage their data.



>
>  That would make text/srt and text/websrt synonymous, which is kind of
>>> pointless.
>>>
>>
>>
>> No, it's only pointless if you are a browser vendor. For everyone else it
>> is
>> a huge advantage to be able to choose between a guaranteed simple format
>> and
>> a complex format with all the bells and whistles.
>>
>>
>>
>>  The advantages of taking text/srt is that all existing software to create
>>> SRT can be used to create WebSRT
>>>
>>
>>
>> That's not strictly true. If they load a WebSRT file that was created by
>> some other software for further editing and that WebSRT file uses advanced
>> WebSRT functionality, the authoring software will break.
>>
>
> Right, especially settings appended after the timestamps are quite likely
> to be stripped when saving the file.


Or may even break the software if it's badly implemented, or may end up
inside the cue text - just like the other control instructions which will
end up as plain text inside the cue. You won't believe how many people have
pointed out to me that my SRT test parser exposed an <i> tag markup in the
cue text rather than interpreting it, when I was experimenting with applying
SRT cues in a HTML div without touching the cue text content. Extraneous
markup is really annoying.



>  and servers that already send text/srt don't need to be updated. In either
>>> case I think we should support only one mime type.
>>>
>>
>>
>> What's the harm in supporting two mime types but using the same parser to
>> parse them?
>>
>
> Most content will most likely be plain old SRT without voices, <ruby> or
> similar. People will create them using existing software with the .srt
> extension and serve them using the text/srt MIME type. When they later
> decide to add some <ruby> or similar, it will just work without changing the
> extension or MIME type. The net result is that text/srt and text/websrt mean
> exactly the same thing, making it a wasted effort.


>From a Web browser perspective, yes. But not from a caption authoring
perspective. At first, I would author a SRT file. Later, I want to add some
fancy stuff. So, I load it into the application again. Then I add the fancy
stuff. It tells me that I cannot save it as SRT, but have to save it as
WebSRT, so I don't lose the information. Good! Now, the pipeline that I have
set up for SRT files transcoding and burning onto video and which cannot yet
deal with WebSRT will not accept the WebSRT file. Good again! Makes me
extend my pipeline or go to the provider and upgrade my software, so I get
the full feature support and the correct rendering. Excellent.



>
>> Do you find MPlayer's behavior annoying because by rescaling already
>> rendered text, the text loses resolution and becomes less readable? This
>> is
>> definitely not the behaviour I am after.
>>
>
> Scaling with the video is annoying with small videos, as the text ends up
> being huge in fullscreen. I assume we're going to do scaling as well as we
> can, so that's not an argument in either direction.
>
> I'll have to withdraw any opinion for now, I don't know how to best deal
> with this.



Yes, I can imagine that on small video it's bad to scale the text down with
the video, since it becomes unreadable. I thought that a solution would be
to define the screen size for which the text was written and then scale the
text with the video. But maybe there is a function that needs to be applied
where there is a minimum font size below which one cannot go and a maximum
font size above which it's bad, too. It seems that scaling text at the same
rate as video is not appropriate. I wonder if there is an optimal function
that people have found to be best? Worth doing some experiments I guess.

Cheers,
Silvia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100811/f52f1ffd/attachment.htm>
Received on Wednesday, 11 August 2010 06:38:32 UTC