[whatwg] Fwd: Discussing WebSRT and alternatives/improvements from Silvia Pfeiffer on 2010-08-11 (public-whatwg-archive@w3.org from August 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 12 Aug 2010 00:24:35 +1000
Message-ID: <AANLkTik2g+WHMh6QrMR+hirHtYuc0x7=jKFzS+6ioO3W@mail.gmail.com>
On Wed, Aug 11, 2010 at 11:45 PM, Anne van Kesteren <annevk at opera.com>wrote:

> On Wed, 11 Aug 2010 15:09:34 +0200, Silvia Pfeiffer <
> silviapfeiffer1 at gmail.com> wrote:
>
>> HTML and CSS have predefined structures within which their languages grow
>> and are able to grow. WebSRT has newlines to structure the format, which
>> is clearly not very useful for extensibility. No matter how we turn this,
>> the xml background or HTML and the name-value background of CSS provide them
>> with in-built extensibility, which WebSRT does not have.
>>
>
> The parser has the "bad cue loop" concept for ignoring supposedly bogus
> lines. Seems extensible to me.



Hmm, that's for ignoring lines that don't match the "-->" pattern. It could
work: ignore anything that's inside a WebSRT file and not a cue.

I tend to think of caption files as composed of the following broad
components:
* header-data that is information that applies to the complete file, which
tends to be setup data (such as language, charset, stylesheet link etc) and
metadata (name-value pairs)
* a list of cues, which have their own structure:
  ** start and end time
  ** per-cue header-type data such as more setup data, positioning, text
size etc
  ** the cue text itself (in various structured formats, potentially with
time markers for roll-on presentation)
* comments that can be made at any location

As long as we can make sure we're extensible within these broader areas, I
*think* we should be ok.



>
>  Sure, that's why the tools should be updated to support the standard
>>> format instead rather than each having their own variant of SRT.
>>>
>>
>> They don't have their own variant of SRT - they only have their own
>> parsers.
>>
>
> That comes down to the same thing in my opinion. This is like saying
> browsers did not all have their own variant of HTML4.


>From an author's point of view, they were not writing multiple different Web
pages, but only trying to accommodate the quirks of each browser in one
page. So, no, I wouldn't regard them as having different versions of HTML4.




>  Some will tolerate crap at the end of the "-->" line. Others won't. That's
>> no break of "conformance" to the basic "spec" as given in
>> http://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format . They all
>> interoperate on the basic SRT format. But they don't interoperate on the
>> WebSRT format. That's why WebSRT has to be a new format.
>>
>
> By that reasoning HTML5 would have had to be a new format too. And CSS 2.1
> as opposed to CSS 2, etc.


They interoperate by their sheer structure. It has been made sure that old
browsers will ignore the new additions because there is a structured means
to grow theres. So, no, I believe they are different cases.



>
>  (And if they really just take in text like that they should at least run
>>> some kind of validation so not all kinds of garbage can get in.)
>>>
>>
>> That's not a requirement of the "spec". It's requirement is to render
>> whatever characters are given in cues. That's why it is so simple.
>>
>
> But it is not so simple because various extensions are out there in the
> wild and are used so the concerns you have with respect to WebSRT already
> apply.


There are two version out there: the plain ones without markup and the ones
with <i>,<b>,<u> and <font>. Nothing else exists. Those could be called
quirks of the same format. I would prefer if SRT meant only the stuff
without any markup at all, which is supported by everyone who supports SRT.
The thing is, WebSRT isn't even backwards compatible with the quirky SRT
extension: it doesn't support <u> and <font>. So, it's neither backwards nor
forwards compatible.



>  Sure. All I need to do is rename the file. Not much trouble at all. Better
>> than believing I can just copy stuff from others since it's apparently the
>> same format and then it breaks the SRT environment that I already have and
>> that works.
>>
>
> At least with the copy approach you would still see something in your SRT
> environment. The <ruby> bits would just be ignored or some such.
>

Preferably, I would be using a captioning application which will make me
aware that I am just now adding features that the format the I used for
saving doesn't support. So it gives me the choice of either losing those
features or upgrading to the better format. It's what all text processors
do, too, so people are used to it. And they know to stick to the more
capable formats.



>
>  That's already part of Ian's proposal: it already supports multiple
>>>> different approaches of parsing cues. No extra complexity here.
>>>>
>>>
>>> Actually that is not true. There is only one approach to parsing in Ian's
>>> proposal.
>>>
>>
>> A the moment, cues can have one of two different types of content:
>> (see
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#syntax-0
>>
>> [...]
>>
>>
>> So that means in essence two different parsers.
>>
>
> Per the parser section there is only one. See the end of
>
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#parsing-0


Yeah, I think there's something missing in the spec.

Cheers,
Silvia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20100812/c529c8b1/attachment.htm>
Received on Wednesday, 11 August 2010 07:24:35 UTC