Re: [Fwd: Last Call Announcement: DFXP 1.0] from Jack Jansen on 2009-06-28 (public-media-fragment@w3.org from June 2009)

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Sun, 28 Jun 2009 22:35:20 +0200
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Media Fragment <public-media-fragment@w3.org>
Message-Id: <1C6C0C6E-3D59-4DD4-84A6-F1869C65F471@cwi.nl>

On  27-Jun-2009, at 08:04 , Silvia Pfeiffer wrote:

> I've reviewed the document on the plane to New York (to the Open Video
> Conference) and am about to write a review blog post and a reply
> email. I will give it as general feedback, not from our group.
>
> I think DFXP satisfies all the requirements we could have as a group
> toward a caption / timed text format, namely:
> * providing a language descriptor for the file (-> lang attribute in  
> tt element)
> * providing means of directly accessing markers for time fragments (->
> id attribute on timed text paragraphs)
>
> The only thing maybe missing is to provide a track name for when the
> DFXP file is included in a media file, but that is not as important
> and can be done during the muxing-process (e.g. using skeleton in
> Ogg).
>
> Anything else I forgot to make sure is available?

One of the things I'm not sure about (and unfortunately I don't have  
the time to check out the whole document... 100+ pages for a subtitle  
format...) is their timing constructs. Originally (5 years ago) they  
started with a SMIL timing model. Then they toned it down. But: I have  
no idea where they stopped.

A full SMIL timing model may be sub-optimal from a Media Fragments  
point of view. A SMIL document cannot be indexed uniquely with, say,  
#t=10,20, because what happens at t=10 may not be completely specified  
(being dependent on user input, for example). And even the underlying  
document has fully specified timing (i.e. no event-based timing or  
media-based timing at all) selecting the portion that corresponds to  
#t=10,20 would require a full SMIL parser and execution engine.

For SMIL this is to be expected: asking for #t=10,20 for a time-based  
composition document is a bit like asking for #xywh=10,10,100,100 of  
an SVG document, it would non-trivial for simple cases and impossible  
in the general case.

When we (we==the SYMM group) designed SMILText we've tried to be very  
careful to design a format that is linear in time, and with timing  
constructs only at the outer level. This was done specifically so that  
selecting the portion corresponding to #t=10,20 would not require a  
full SMILText parser, but simply inspecting the toplevel elements for  
their begin and next attributes and either copying or skipping the  
whole element and its substructure. (and giving up if event-based  
timing was used).

I'm pretty sure the same is true for CMML, it should also be possible  
to extract a temporal subsection without a full-blown CMML execution  
engine (but correct me if I'm overly optimistic here).

I'm not convinced it's possible to extract a temporal fragment of a  
DFXP file without a full execution engine. But: maybe they've defined  
profiles or something in the mean time that allows this (fragmenting  
is not the only use case that would benefit from a subset that is  
linear and easily parsed and such).
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman

Attachments

application/pkcs7-signature attachment: smime.p7s

Received on Sunday, 28 June 2009 20:36:02 UTC