- From: Nigel Megitt <nigel.megitt@bbc.co.uk>
- Date: Wed, 29 Oct 2014 01:44:56 +0000
- To: TTWG <public-tt@w3.org>
- CC: "dahl@conversational-technologies.com" <dahl@conversational-technologies.com>, fantasai <fantasai.lists@inkedblade.net>
- Message-ID: <5941EAB8802D6745A7D363D7B37BD1F75C4E5424@BGB01XUD1012.national.core.bbc.co.uk>
Thanks again to all who attended today's TTWG meeting. The minutes for both 27th October and 28th October are concatenated and available in HTML format at http://www.w3.org/2014/10/27-tt-minutes.html We made one resolutions today: RESOLUTION: We will support the EBU line padding proposal with the combination of padding on inline content elements and box-decoration-break. The review period for this resolution ends on 11th November according to our Decision Process. Today's minutes in text format: <trackbot> Meeting: Timed Text Working Group Teleconference <trackbot> Date: 28 October 2014 <scribe> chair: nigel <scribe> scribeNick: nigel <inserted> Day 2 - Tuesday 28th October Introductions Observers: Noria Sakamoto - Toshiba, interested in broadcasting in TTML Jerome Cho, LG Electronics - wants to meet FCC regulations for accessibility with TTML, WebVTT etc. Francois Daoust, W3C, just observing. Courtney Kennedy: Engineering Manager at Apple, responsible for subtitles. Cyril Concolato, University in Paris, GPAC/MP4Box etc <pal> Pierre Lemieux, supported by MovieLabs / IMSC1 editor Debbie Dahl, Chair of Multimodal Interaction Group, observing. Interested in synergies between timed text and EMMA standard Kazuhiro Hoya, Fuji TV interested in UHDTV, which will adopt TTML for closed caption Nigel Megitt, BBC, Chair of TTWG Glenn Adams, Skynav. Editor of TTML. WebVTT draft updated [38]http://dev.w3.org/html5/webvtt/webvtt-staged-snapshot.html [38] http://dev.w3.org/html5/webvtt/webvtt-staged-snapshot.html glenn: Looks good to me. Cyril: LGTM. When can it be published? nigel: Tuesday next week at the earliest, depending on staff Cyril: How will we publicise it? nigel: I expect Dave Singer to publicise it to the charter dependency groups, W3C and external, and to the W3C liaisons. ... Dave has also suggested that we publicise it socially at TPAC too. Spec restyling work (fantasai) fantasai: The spec templates and styling over time have become outdated - we can use some cool web technologies to make specs more readable ... The scope of the design is not a back-end web app, just HTML/CSS. We want a design for desktop, mobile and print, in that order. It will change the markup of the headings ... and the boilerplate in the Status text that is data, not paragraph text. We'll push legalese to the bottom. The abstract should be 2-3 sentences above the fold, then ... the TOC available without having to scroll it. Then URLs, issues, feedback etc should be at the top in a more compact format. So the scope of it is to ... redo markup, boilerplate, styling. We'll look at styling, clean up the stylesheet to be more readable, make sure that the document is still quick-scannable ... A lot of styling that is ad hoc, like fragments, example code, could be harmonised across all the W3C specs. This will take a while - it's a side project for me. We want to take into ... account what the WGs need. ... The functional questions are [the ones on the agenda]. We also want general feedback on what to consider, e.g. protocol-relative links so we can switch to https. ... or always-visible TOC. ... The first question is a subjective/emotional one - what should the style express, in terms of values. ... If we used primary colours and comic sans it would feel like a toy not a spec. But if we used a parchment background and old style font it would look old-fashioned. ... They're not appropriate for W3C, - we want to know what is appropriate though, in terms of how they feel. glenn: Do you have any templates or ideas? fantasai: We have a proof of concept but we don't know where it's going just yet. It's going to be experimental - design by consensus. We're asking for ideas from the community. glenn: Part of the problem is that you have different audiences for different specs. fantasai: We haven't had feedback on different styles for different document types/audiences. We want them to fit together and feel like they belong together. glenn: One of the problems is that we have a lot of history. fantasai: We want to make sure that every group has a working toolset - tell us what you use in response. Cyril: Some specs have a developer view and an author view. fantasai: Put that under question 6 'what else we should know/consider' nigel: Can we answer the questions? ... Q1. 3-5 adjectives Cyril: I have no idea. ddahl: Authoritative glenn: consistent ddahl: comprehensive glenn: One problem is that documents don't have the same styling. A lot of it is editor-specific. ... The variation may cause some problems. I've also worked with ISO and ITU which crank out format-consistent specs that are somewhat impenetrable. nigel: open, welcoming? glenn: It would be nice to use newer styling mechanisms. You can't push the envelope too far without hitting browser variations. courtney: clean, modern. ... Additional considerations: Should be something that will work for low vision people, using magnifiers, voice-over etc. nigel: URLs? Cyril: TTML1, WebVTT, IMSC 1 nigel: +1 ... Do we have documentation of our markup conventions? glenn: Not really - we use XMLSpec as a technology (from 1999). Others use respec (WebVTT and IMSC 1). ... It's very unlikely that we will adopt respec for TTML. ... In TTML1 we have a conventions section in the document. XMLSpec and Respec are separately documented for markup. nigel: Spec processing tools? We've looked at those already. Are there any more? group: no more nigel: Do we have any goals? Cyril: To have the table of contents kept visible when scrolling, for easy navigation. courtney: +1 Cyril: In some PDF viewers and Word, I like that searched-for words are listed as occurrences on a separate panel. courtney: Better search would be good. ... When I find a page, it's hard to see the structure of the whole thing and relations with other specs. nigel: I hate it when clicking on references takes you to the Reference section not the thing being referenced. Cyril: +1 What's the point in it? nigel: What about making defined terms links to other places where they're used? Cyril: +1 ... A way to list normative (testable) statements in the spec, to generate test suites automatically would be great. CSS has that I think. tidoust: It was the packaging spec format. glenn: That was done by adding markup to every paragraph. Extracting assertions from a spec is a hard to automate, complex process. Maintaining it becomes quite challenging. ... Plus it's not a science. Declarative statements (X is Y) can be viewed as normative in some places, then if (X is required) implies (Y is required). It's a nice idea, but hard to do. ... People offer paid services to do that, because it's complex. nigel: Is there anything else we should know/consider? ... Cyril mentioned Developer view/Author view earlier. <tidoust> [FWIW, I was referring to the fact that the Packaged Web Apps (Widgets) spec was written in a way that allowed the extraction of test assertions, see: [39]http://www.w3.org/TR/widgets/ and the test suite: [40]http://dev.w3.org/2006/waf/widgets/test-suite/#user-agent ] [39] http://www.w3.org/TR/widgets/ [40] http://dev.w3.org/2006/waf/widgets/test-suite/#user-agent Cyril: I think there's an HTML 5 spec (might be the WhatWG spec) that does differing views. glenn: There's a lot of advantage to marking up specs to allow automatic extraction. For example, IDL fragments have conventions that allow some tools to automatically pull out all the IDL ... to generate a test generation process. We don't have APIs in TTML at the moment. If CSS had followed a similar convention for how properties were defined, and HTML had ... followed conventions for how elements and attributes were defined, then a similar tool could have been used. They didn't adopt a convention though, so it's a manual task. ... Those are the kinds of tools that it might be useful to consider. We could mark up elements and attributes. In the original markup I used a few syntactic conventions to assist. ... For example ID attributes. I use some specific conventions for how identifiers are presented. ... I have never documented it anywhere? TTML2 Timing relationship with related media objects. glenn: I can walk us through this. The issue originally came up from an example TTML document with some negative time expressions. ... I immediately pointed out that you can't do that! I did wonder why they are putting -ve time expressions in a TTML document. courtney: Caption authors may use different timecode from the video editors. glenn: Based on that I thought it would be useful to have an offset from authored time expressions to some useful point in the media, and allow the player or processor ... to use that offset to achieve synchronisation rather than mandating precise synchronisation between TTML times and the media times. ... As I explored that some issues came up. One was the difference between the Origin of the document timeline and the Begin of the document timeline, and whether they ... are different times or the same time. I looked at SMIL and SVG time semantics to try to ascertain what was used there. I also reviewed the earlier TTML1 work. ... We have a concept in TTML that has its own terminology definition in TTML2, Root temporal extent. ... When this talks about beginning or ending, does it mean beginning of the coordinate space or the first timestamp in the document. ... Say a TTML document has a body with begin="10:00:00". Is the origin "10:00:00" or 0s in the document. I eventually tentatively concluded that for document time coordinate ... spaces is always zero and the beginning of the document is always zero. That doesn't mean it's the timestamp of the first timed element in the document. ... Everything in SVG and SMIL is predicated on the default begin being zero, in the coordinate space of the document. Recently I came round to understanding that the document ... time origin and the document begin point in time are the same. Then if I want to synchronise a document with some media, then what point am I synchronising? The origin of the ... media timeline or the begin. Let's say for example, I have a related media object that starts at 5 hours into the media timeline. The first timestamp in the media is 5:00:00 (5 hours). ... What do I want to synchronise the 10 hour time in the document with, in the media? There are 2 options. One is to have zero in the document time coordinate space correspond to ... zero in the media time coordinate space. Another is to say that there's an offset between the document time coordinate space and the media time coordinate space, and that offset ... is between the two origins of the coordinate spaces. A 3rd option is to pin the origin of the document coordinate space (zero) to the begin of the media time coordinate space (5 hours). ... That latter one doesn't seem to be quite so correct. ... [draws a picture] nigel: Is this predicated on timebase="media"? glenn: Let's assume that. The general answer may extend to continuous SMPTE timebase too. ... I have two entities, a video and a document. Each has a timeline - the video content has a timeline and the document body has a timeline. ... The root temporal extent of the document is the timed beginning of the document to the timed ending of the document. The choices seem to be the origin or the start of the ... first timed element. ... I think the logical begin is always zero, if begin is not specified. ... So this is the document time reference synchronisation point. What do want to tie it to - the beginning of the media or the origin of the video time coordinate space? ... My thinking has evolved on this. Originally I thought that Begin(body) would be synchronised with Begin(media). Then the offset would be between those two points. ... The more I thought about it the less viable it seemed to be. Eventually I came to the conclusion that we should synchronise Origin(document) and Origin(media). ... Then if they happen to correspond, and both the video and the body say 10h in their own coordinate spaces then they would line up. ... i.e. they would be isomorphic time spaces with zero offset. ... One of the interesting example issues is: what if the playback rates differ between the media and the document. What happens to dilations or contractions in the timelines? ... It seems like if they're both synchronised with the zero point then any modification of the playback speed, as long as they're coordinated, would work out pretty well. ... It means that you can simply multiply the coordinates with the playback rate. <Cyril> scribeNick: Cyril nigel: (describing an email sent earlier) ... I analyzed it a different way ... with all possible combinations of timebase and related media ... 3 different timebases in TTML: media; SMPTE; and clock ... what "relationship" means in the temporal extent definition ... if there is no related media, there is no relationship, that's easy ... the root temporal extent is from begin to end of document ... they can be unconstrained ... begin is origin and end is infinity ... if clock time is used, there might be a relationship with some media ... nothing here contradicts Glenn ... example: tape with every frame with timecode ... if you are using media times, the origin of the document is the begin of the media ... you expect the origin of hte document to be equivalent to the begin of the media ... 5s in the document means 5s in the media ... the root temporal extent is constrained by begin media/end media glenn: SMIL and SVG make the difference between the specified and active time interval ... the question: is root temporal extent meant to express the active interval or the extent of the time coordinate space of the document ? nigel: the next limitation is when you have media with SMPTE timecode and SMPTE timecode in the document ... the document times and media times are actually the same ... so no offset applies here ... for marker mode = continuous, this is equivalent to saying origin(document)=origin(media) ... however the rule as stated also works for marker mode = discontinuous ... i.e. when document times = media times ... next: media with SMPTE time codes and clock in the document ... the only interpretation is that the document types are supposed to be equivalent to clock times when you play the media ... ex: document time says 10:05, but starts playing at 10:03 ... the use case for this are strange glenn: wall clock values are converted to times by substracting the wall clock start time of the document (according to SMIL) nigel: consistent with my interpretation ? glenn: yes nigel: then there is a category of media with no SMPTE time codes, but with time ... same as glenn, the origin of the document is equivalent to the origin of the media ... again the framing that glenn talked about applies ... the active time cannot go outside of the playback glenn: the media active interval is as if it was a parent of the document active interval nigel: if audio is continuing but video is not, the viewer is continuing, you should be presenting subtitles ... this is an implementation case ... any offset that needs to be applied will be externally ... I don't want to duplicate what is already in MP4 files for instance glenn: the timeoffset I came up with makes it easier ... if the house that made the media did not have the media in hand, they can provide the offset nigel: is that a hypothetical case ? courtney: no ... different houses will have different conventions ... when someone give content to itunes, they give a video ... later on they'll get european or asian conventions nigel: i don't understand the convention courtney: some times people don't want to use zero nigel: in SMPTE time code yes <nigel> scribeNick: nigel Cyril: I understand Courtney's use case. The TTML document doesn't reference the media itself. So it will be used in some external context with the media. ... For example an MP4 file, MSE, DASH. All of those have timestamp offset facilities, so I'm puzzled why we're talking about this here. glenn: Those are all different systems with different ways to express the offsets. If it only can be carried outside the document it might get lost. ... It's useful to have it in the document as a reference point to express the intent of the author. We often need to export things from in the document to outside the document. Cyril: So you want to export from the document some time reference? glenn: yes Cyril: That is fine. glenn: Courtney isn't the only person to bring this up - I've had other reasons to add this over the years. courtney: complex production workflows do mean that we sometimes need to do this. Cyril: These examples seem to be overly complicated. glenn: I think nigel wanted to cover all the cases, which is a useful exercise. ... Neither of us defined BEGIN(document) and ORIGIN(document) actually meant, which is a problem talking about this! <Cyril> glenn: glenn: Is it the time of the first thing in the time frame or the origin of the framing time. nigel: the next row is where the media doesn't have timecode but the document has smpte timecode, which may start at some arbitrary point according to convention. ... In that case I can see that an offset would be useful, to say "the start of the media is at e.g. 10:00:00". I'm less comfortable doing this with media timebase, but it's quite closely related. Cyril: The same problem will arise with WebVTT. ... The general problem is how to carry in-band the time value of the begin of the media in the document timeline. nigel: Do houses really begin at 300s? courtney: I only see this as a real world problem with TTML, not WebVTT. glenn: I've seen examples with media timeBase. For example, taking into account a pre-roll of 13s. courtney: This could also apply to WebVTT at some point in the future. glenn: I added a few notes. I need 2 questions answered. ... My hypothesis is that the label BEGIN(document) means the origin of the document, i.e. zero on the document time coordinate space. ... I believe that's most accurate in relation to SMIL and SVG. ... This is not the time of the body. ... Now the question is what we call the Root Temporal Interval - does it also start at the origin, or at the body. We may need to distinguish the active root temporal interval ... from the overall unqualified root temporal interval. ... I want to see if the group can agree that hypothesis. ... Then I need a decision on which of the 3 models to use to describe the timing relationships. ... 1) The two origins sync up. courtney: I don't see how that solves it. Cyril: That's the only one that works! nigel: +1 glenn: In that case the offset is the difference between the origins. ... 2) The origin of the document syncs up with the beginning of the media. ... This one seems more natural to me because 10 hours means 10 hours into the video. nigel: Not if the timebase is smpte! You have to enumerate all the options. glenn: 3) Begin(body) is begin(media). I don't think this one works too well. ... When you use media times instead of timestamps then you mainly want 2). But with SMPTE timecodes in the media it seems like 1) may be more applicable. nigel: I think that's right. Cyril: There's another way to do this - what happens if you have an audio track with some offset too? ... In the MP4 and DASH case, and all the others I know, you only care about the media itself. The TTML document has an anchor point, e.g. 10 hours if that's the begin of the media. ... Then you use that to anchor it onto the timeline. In MP4 if the video has a big gap at the beginning, you use an offset to say when the beginning should occur. Same with the audio. ... The TTML document should just give its anchor point as the time value in its coordinate space that corresponds to the beginning of itself. nigel: +1 that's the proposal I made too. glenn: So you're saying a media begin point as opposed to an offset in the document timeline? nigel: yes. glenn: So if zero in the document is zero in the media the media offset is 0 ... And for SMPTE timecode with the 10:00:00 convention the value would be 10:00:00. ... I like that suggestion because it seems to work regardless of the timeBase. Have you worked through any play rate differences? nigel: I'm not confident that I've worked through all the playrate consequences. Cyril: The solution seems to be found - it needs to be liaised back to MPEG because it affects the carriage of TTML in MP4. You'd have to store it in the MP4 file. glenn: Couldn't you just look in the TTML document? Cyril: Let's say your document has an offset of 10 hours - will the first sample say 10 hours or zero - is an edit list required? glenn: In SMIL you can have captions that start before the media and end after, but get effectively truncated. Why would it affect the carriage in MP4? You can still look inside the document. Cyril: When you stream/seek/segment the document you don't want to look inside it. glenn: I think I have enough guidance on this to move forward on resolving it. nigel: Let's take a break - back at 11. IMSC 1 CR Exit criteria action-345? <trackbot> action-345 -- Nigel Megitt to Make request to philip and silvia to change living standard to editor's draft. -- due 2014-11-03 -- PENDINGREVIEW <trackbot> [41]http://www.w3.org/AudioVideo/TT/tracker/actions/345 [41] http://www.w3.org/AudioVideo/TT/tracker/actions/345 close action-345 <trackbot> Closed action-345. nigel: sorry that was the wrong action but it was done! action-346? <trackbot> action-346 -- Nigel Megitt to Scribe notes on cr exit criteria for imsc 1 based on meeting debate -- due 2014-11-03 -- PENDINGREVIEW <trackbot> [42]http://www.w3.org/AudioVideo/TT/tracker/actions/346 [42] http://www.w3.org/AudioVideo/TT/tracker/actions/346 close action-346 <trackbot> Closed action-346. nigel: Goes through scribed notes - group makes edits ... Conclusion is: Our criteria for exiting CR will be: Provide an implementation report describing at least 2 independent implementations for every feature of IMSC 1 not already present in TTML1, based on implementer-provided test results for tests and sample content provided by this group. We will not require that implementations are publicly available but encourage them to be so. We will not exit CR before January 16th 2016 at the earliest. pal: That's enough for me to edit the SOTD in the CR draft - I'll need to get respec.js to allow this custom paragraph. ... The next CR draft may include this text in a weird style just to get around respec.js. nigel adjourns meeting for lunch - restart at 1300 /| group reconvenes Change Proposals Reviewing change proposals at [43]https://www.w3.org/wiki/TTML/ChangeProposalIndex [43] https://www.w3.org/wiki/TTML/ChangeProposalIndex Change Proposal 15 [44]https://www.w3.org/wiki/TTML/changeProposal015 [44] https://www.w3.org/wiki/TTML/changeProposal015 glenn: margin - this would be very easy to add since there's a straight mapping to CSS. I haven't had enough feedback that its needed. nigel: There's nothing from EBU courtney: margin isn't permitted in WebVTT either. glenn: The described use case, for indenting 2nd and subsequent lines, wouldn't be supported by margin anyway. It's really a hanging indent. ... We don't have any indent support, hanging or otherwise. nigel: Is there a related issue for margin? glenn: I don't think so. ... I'll edit this on the fly now. ... marked as WONTFIX. ... box-decoration-break. This got moved in CSS to 'fragmentation' <glenn> [45]http://dev.w3.org/csswg/css-break/ [45] http://dev.w3.org/csswg/css-break/ <glenn> [46]http://www.w3.org/TR/css3-break/ [46] http://www.w3.org/TR/css3-break/ glenn: but the short name is still css-break! It was last published as a WD in TR on January 16. <glenn> [47]http://www.w3.org/TR/2012/WD-css3-break-20120823/ [47] http://www.w3.org/TR/2012/WD-css3-break-20120823/ glenn: Mozilla seems to have an implementation of this that's working. nigel: Even in today's draft the property and value combination still exist. glenn: We have two options in TTML2 syntax: either use box-decoration-break directly or go ahead and use something simpler like linePadding and map it to this CSS property. ... The latter disconnects it as a feature from this particular instantion. nigel: That's the normal way we do it, but I can see that with padding specified on content elements then it would be a duplication to add it a second time through linePadding. glenn: In that case I proposed adding support for box-decoration-break in addition to padding on inline content elements that can now be specified. PROPOSAL: support the EBU line padding proposal with the combination of padding on inline content elements and box-decoration-break. RESOLUTION: We will support the EBU line padding proposal with the combination of padding on inline content elements and box-decoration-break. issue-286: (TTWG F2F today) We will support the EBU line padding proposal with the combination of padding on inline content elements and box-decoration-break. <trackbot> Notes added to issue-286 Extend the background area behind rendered text to improve readability. glenn: border - we've added border and made it applicable to both region and certain content elements - body, div, p and span. ... One of the open questions is that border in css is a short hand for specifying the width height and colour of all the borders simultaneously, not each border separately. ... As well as this super-shorthand border property, there is the border-width, border-style and border-color properties, which allow those values to be specified on all or any from 1-4 borders separately. ... Then finally there are 12 long hand versions for each of these plus -top -right -bottom and -left. ... I've implemented the shorthand, but we could go for the more longhand versions. nigel: We should check what's needed to match 708 window styles. courtney: The FCC regulations don't go to the level of granularity of this. glenn: I think this note came up when we were doing SDP-US - there has been a request in the past to describe which 708 features are supported in TTML. courtney: That's not the same as a requirement. I don't know of any examples of subtitles with borders on them. ... 708 says borders may be raised, depressed, uniform or drop-shadow. ... I don't see anything about styling the different sides separately. glenn: It's not clear what the mapping is for all those values. box-shadow in CSS may apply where drop-shadow is required in 708. ... They also introduced border-radius. nigel: Let's move on from this - I think we've done enough. glenn: line stacking strategy. I don't think we need to do anything on this right now - I put this in originally, so I'll mark it as under review by me. ... region anchor points - this was a proposal from Sean to have an auto keyword for the origin and extent properties on regions. ... I believe there's something like this in WebVTT. ... TTML doesn't have these at the moment. Sean was the champion and we don't have any other champion or requirement for this right now. ... I would say we should not take any action on this right now. nigel: I agree - it's unclear even how the proposal maps to the WebVTT way of positioning and sizing regions. glenn: text outline vs text shadow. When we defined textOutline in TTML1 CSS was also working on an outline property. ... the new CSS definition of drop shadow allows you to specify multiple shadows simultaneously. courtney: the FCC regulation requires text edge attributes: normal, raised, depressed, uniform and drop-shadow. glenn: TTML1's textOutline offers thickness and blur radius. You'd have to have multiple simultaneous shadows to achieve raised and depressed styles. courtney: Authoring that would be complex. glenn: XSL-FO defined a text shadow property even though CSS had not done so. We ended up calling it textOutline and we also limited it to just one dimension, not two. ... It's now officially defined in the CSS 3 text decoration module, called text-shadow. It takes 2 or 3 length specifications. <glenn> [48]http://dev.w3.org/csswg/css-text-decor-3/#text-shadow-prope rty [48] http://dev.w3.org/csswg/css-text-decor-3/#text-shadow-property glenn: What we could do is define some new keywords that the processor can map. That makes it easier for the author to choose amongst the different choices including raised and depressed. courtney: That seems like a nicer way to do it. nigel: +1 glenn: There are two questions: firstly, should we change the name from textOutline to textShadow? I would say no. We can just define the mapping semantics, and already have different naming anyway. Proposal: retain the attibute name textOutline. glenn: Proposal: add two new keywords for raised and depressed to meet FCC requirements and define mappings. ... There's a third proposal to add a 3rd optional length specification. This would allow separate definition of offset in x and y as well as blur. ... I see that textOutline doesn't offer a shadow, but a uniform outline that expands by the required length around the glyph. ... I need to think about this some more. ... Now I recall why we thought about adding a new attribute called textShadow, to allow this. I don't want to take away textOutline and remove backwards compatibility. ... either we enlarge the definition of textOutline to make it include shadow, or add a new textShadow property. I need to review it and see if I can come up with a proposal that works. nigel: We're getting behind on the agenda. We'll come back to this later. Change Proposal 25 - Distribution [49]https://www.w3.org/wiki/TTML/changeProposal025 [49] https://www.w3.org/wiki/TTML/changeProposal025 <scribe> scribeNick: courtney : nigel: tab to autocomplete is great! nigel: : topic is combining groups of documents cyril: in the tool mp4box, if you import ttml files to mp4 and concatenate more than one ttml file, then extracting the track should give you a combined ttml document. glenn: xml:id uniqueness- a similar problem exists in ISD creation as described in the document combining proposal. ... btw, do you have an example of a specification for a merge algorithm? an xml syntax? nigel: rules are laid out in presentation glenn: so you wouldn't have some way for documents to specify a set of rules that it can follow? nigel: no, there would be an external set of rules glenn: would there be any content support required- additional metadata, etc? nigel: no glenn: you could exclude documents that contain elements with mixed content nigel: perhaps, but that might be difficult because you could not use break spans within a sample. <courtney_> nigel: normalize whitespace for comparison of samples. <courtney_> nigel: to compare elements, need to transform their times into a common timeline. <courtney_> glenn: you could translate to the isd space first prior to comparison. <courtney_> nigel: not sure what is possible with that approach. <courtney_> glenn: this is a transformation process, could be a separate spec from TTML. <glenn> ACTION: glenn to check if timeContainer explicitly has no semantics with timeBase smpte, markerMode discontinuous [recorded in [50]http://www.w3.org/2014/10/27-tt-minutes.html#action05] <trackbot> Created ACTION-347 - Check if timecontainer explicitly has no semantics with timebase smpte, markermode discontinuous [on Glenn Adams - due 2014-11-04]. <courtney_> glenn: does the ttp:documentGroup type <xsd:NCName> proposal match the syntax of Id in XML? <courtney_> glenn: yes it does match <courtney_> what's the motivating use case for this? <courtney_> nigel: to archive live created subtitles documents and to be able to create distributable time constrained segmented documents for streaming <courtney_> Cyril: I'm not convinced there is a need for standardization here yet. <courtney_> pal: is there a need for a standard when dealing with a private archive where the owner controls what goes in and what comes out? <courtney_> courtney_: ttml requires lots of small files for captioning live events, and this seems like a limitation to me <courtney_> pal: no it doesn't have to be, streaming inherently involves lots of files ISD formalisation <nigel> scribeNick: nigel glenn: shows a terminal window! Invokes some code (ttx.jar) with a command line specifying the external-duration and an input ttml file ... Looks at TTML input document, that would present as 0-1s: Foo, 1-2s: Bar, 2s-[unbounded]: Baz ... Code validates the input and then writes out 3 Intermediate Synchronic Documents (ISDs). ... looks at output documents. isd elements in new isd namespace, with begin and end on the top level element. group questions status of this work. glenn: TTML1 defines ISDs but no serialisation of them, nor are all semantics fully defined. In TTML2 ED there's an annex that defines these, with a syntax for ISD. ... This is a proposal with the option for change. pal: If we say any TTML document can be split unto a number of ISDs why isn't each ISD itself a TTML document. Why introduce a new structure? glenn: Some good reasons. One: the constraints are different in an ISD document than in a TTML document. For example region elements have a body child. nigel: You could create the ISDs as individual TTML documents prior to rehoming the body to each region and resolving the styles, as another option. glenn: I explicitly wanted to put the ISD into a different namespace to reduce confusion. I realised when I started to formalise this then if I started with tt:tt and made it ... polymorphic then it would be much harder for people to understand, and parsers. pal: +1 for that ... More fundamentally, can the ISD format be mapped into a TTML document? glenn: I don't know - that wasn't in my thought process. ... There are two reasons for doing this work. One is to create HTML versions - you have to convert into a time-flattened version of the original TTML and apply the styles, and resolve region references. ... I wanted to make sure that process was fully articulated, which is essential to move forward. ... The other strong reason is to make it easily mappable into the cue structure of HTML text track. My model for each of these ISDs is one cue. ... Microsoft in the past tried to put a TTML1 document into a cue. It wasn't standardised anywhere. I want to have a good story for generic mapping TTML into a sequence of cues ... that fit into the TextTrack model. So my motivation was that each ISD should be representable as a cue, and furthermore to be distributable as a sequence of ISDs. pal: How can I turn this back into a TT document? glenn: I don't know - I didn't want to do that. pal: So you've effectively created a new format. glenn: It was my intention to make this a new format that could be used for distribution. ... The other option would be to heavily profile TTML to allow it to be distributable. By the way, there are already more than one kinds of document that are specified by TTML. ... My proposal would be to use the same MIME type and a different document type within that. pal: My initial feedback is this introduces a new format in a world that has too many already! nigel: There are multiple steps here. The first is to formalise something that's only conceptual for describing an algorithm in TTML1; the second is to make it a serialisable format. ... If your end goal is a TextTrack cue why not go all the way? pal: My feedback is that we should store these as TTML documents. glenn: Not only has timing been flattened in this process but also styles. The only styles that are expressed here are those that are not the initial values. ... [shows ISD output] There's an attribute and element called "css" meaning computed style set. Coding wise, this has been an important step for validating our algorithm. ... Notice that it still uses the TTML namespace - it copies the body into the region element; there can be more than one region in the isd. group expresses some reservations about defining a new format glenn: There are some questions: 1. Is it important and useful to define a serialisation format for ISD? ... I think it's both. It would help in many ways and reduce the discussion about streamability. Cyril: It's not TTML anymore, so it's not streaming TTML. It's streaming something else. nigel: We have a wider environment in which organisations are creating and distributing TTML documents and writing players. There's no problem splitting temporally on the client side, ... so creating a new format where the temporal division happens server side doesn't seem to be necessary. group adjourns for a break back at 1600 Multimodal Interaction - Debbie Dahl nigel: Introduces Debbie as an Observer who has some requirements for multimodal interaction and thought the solution space may involve TTML. ddahl: Introduces EMMA 1.0 [51]http://www.w3.org/TR/emma/ [51] http://www.w3.org/TR/emma/ ddahl: Emma represents captured user input in different formats. ... Now considering capturing system output as well as user input. It's helpful to have inputs and outputs in the same format for processing, debugging and analytics. ... Could be static, defined ahead of time ... Generated dynamically by an intelligent system ... EMMA is an XML language. We're thinking about capturing output. [shows an example] ... This example happens to have an ssml message in it, the <speak> element. SSML has lots of available complexity, not used in this example. ... Then other multimedia things might go along with it, such as HTML and other kinds of multimedia output - SVG, whatever seems appropriate to the application. ... My original question was: if we want to speech synthesise some output or synchronise pre-recorded audio with some other kind of multimedia, e.g. video, an animation of ... planes going across a map etc. ... How could we take advantage of the work done in TTML to make life easier for us in multimodal interaction to synchronise multimedia outputs generated in real time by interactive systems. pal: If what's generated is audiovisual, that's a possibility. ... maybe you want to provide captions. courtney: there could be a series of responses with timings. ddahl: You could say "There are flights to Boston from Denver..." and then ask a follow-up question. When you ask what time of day would you be interested in flying, at that point ... maybe you display a form. courtney: If you had some animation that shows a map, and you know it will play for 3 seconds, and then in 3 seconds post your next question? ddahl: Yes. Would it be as simple as incorporating TTML maybe wrapped around another element. nigel: Thinking about the concepts in TTML, there's a timeline against which things could be synchronised, plus styled and positioned text. There is an issue raised for associating ... audio representations of text, but at present the spec describes visual rendering only. ... It could be that SMIL is a good place to go. pal: It's certainly more flexible. ... Is this purely semantic or is there a playback requirement? ddahl: In the vision, there's a system that renders the captured input into something human-understandable. In the end there would be playback. pal: The more you're interested in playback the closer you are to TTML or WebVTT which are intended to be used for playback. courtney: If you're synchronising other kinds of media it's a good choice. ddahl: I guess you could use TTML and SMIL? glenn: That's right - TTML is designed to be referenced by the <text> element in SMIL. In the abstract for TTML we say: ... "In addition to being used for interchange among legacy distribution content formats, TTML Content may be used directly as a distribution format, providing, for example, a standard content format to reference from a <track> element in an HTML5 document, or a <text> or <textstream> media element in a [SMIL 3.0] document. " pal: If you'd like to display text or captions over audio or video you should use TTML. nigel: If you want to display any text that changes over time then TTML is a good fit. Probably the time modes in TTML are rich enough to support any use case you're likely to have. <Cyril> scribeNick: Cyril glenn: you can refer to TTML1 today, because TTML1 is REC ... if you need features of TTML2, you'd have to wait ddahl: we've done work on what we call "output timestamps", for when it is planned vs. when it happened ... when it actually happens is easier glenn: TTML does not care about when it happens ... we say when we want it to happen ... we use presentation time stamp in the MPEG sense ... however we have one mode where we use the SMPTE time base ... using SMPTE Time Code along with the video ... you can think of them as labels ... when one of these labels in the video appear that is when the matching TTML element is active (glenn explaining the different modes SMPTE, Timestamps and clocks) scribe: they derive from SMIL ... we use a subset of SMIL, not repeat for instance ddahl: we might want to do somethings that does not have to do anything at all with text, like picture and music glenn: we plan to add support for images and possibly audio in TTML 2 ... we will definitely not support video in TTML nigel: the use case for audio is audio description ... generally created by the same company glenn: we don't want to turn TTML into SMIL light nigel: at the moment you have simple SSML ... but if you start having details in SSML ... this is closer to the processor than the human ... I wonder if we couldn't go in that direction in TTML adding emotion, pronunciation, ... ... like a format called PLS (Pronunciation Lexicon Specification) ... this wouldn't affect the TTML document structure at all ... that could guide synthetise audio ... There is also EmotionML that is interesting ... a big use of TTML is for caption and subtitles ... but they are just text, without expression ... EmotionML gives you some information ... but how do you present that emotion glenn: like emoji courtney: there are conventions also ... describing the way the text was spoken (not the emotion) ... that's an interesting idea to explore ... the most artful captions have seen describe the way the text was prononced ddahl: emotionML has different vocabularies ... there is a standard vocabulary, but you can add your own courtney: if you would be too heavy handed in the way you describe the emotion, it could be condescending to the hard of hearing ... you'd have to do it artfully ddahl: some people may have a processor to process emotionML nigel: currently the emotions are in the text, forcing everyone to view them ... if you capture emotion and pronunciation would suffice to synthesize speech ddahl: you would need prosody or other aspects nigel: no one has brought this use case to TTML first <nigel> [52]http://www.w3.org/TR/emotionml/ [52] http://www.w3.org/TR/emotionml/ ddahl: I had an example of annotating a video with EmotionML glenn: TTML allows you to mix any content if it is in a different namespace Cyril: Example 2 of annotation of videos in the emotionML spec seems to have problems (use of ? instead of #, use of "file:" instead of "file:///" (ddahl shows a demo) nigel: you can either add external content to TTML or extend TTML Cyril: you might want to consider using a separate track ... not merging it in the TTML document but using a separate track in the HTML sense nigel: there does not seem to be any action on this for us at the moment ddahl: I came looking for information and i'll bring that back to my group <nigel> scribeNick: nigel Change Proposal 14 Audio Rendering [53]https://www.w3.org/wiki/TTML/changeProposal014 [53] https://www.w3.org/wiki/TTML/changeProposal014 nigel: I was going to propose as per CP14 that we consider adding PLS and EmotionML into TTML but it seems that we do not need to: foreign namespace content can already ... be added with no spec changes. issue-10? <trackbot> issue-10 -- Allowing pointers to pre-rendered audio forms of elements -- open <trackbot> [54]http://www.w3.org/AudioVideo/TT/tracker/issues/10 [54] http://www.w3.org/AudioVideo/TT/tracker/issues/10 Issue 10 proposes adding a pointer to an external audio file, which is the analogue to a pre-rendered graphic image. nigel: Issue 10 proposes adding a pointer to an external audio file, which is the analogue to a pre-rendered graphic image. ... CP14 is a Priority 3 on our list, so I don't think we should spend any more time on it right now. ... Instead, we should go through the Priority 1 CPs and resolve any outstanding questions so that we can complete the TTML2 deliverable. glenn: Let's return to CP15. We were up to shrink fit ... We don't have a champion for shrink fit and no issue, so I propose to do nothing. ... font face rule - we do have an issue for that. I'm not sure if we need the fontFaceFormat attribute. ... This implies that there's a fallback loading system that would pick the source that it knows how to process. ... That would introduce something new in TTML2, which is the ability to refer to resources outside the document. ... There's a way to get around it, which is to use data URL, i.e. embed the data as BASE64 encoded characters in the document. nigel: I prefer external references for fonts because they allow caching. glenn: There's a similar issue for backgroundImage resources. ... I don't have any open issues on this one. ... multiple row alignment ... I haven't worked through the possibility of using flexbox. I tried to generate some samples and they seemed to produce the same results. ... I don't want to introduce all of flexbox into TTML2. My current thinking is to define a TTML-specific property given the semantics according to the proposal. ... As part of the mapping to HTML it could potentially be mapped to flexbox. ... So I need to define a new property that is named appropriately and provides these semantics. nigel: How will we reference the pre-existing similar feature in IMSC 1 and EBU-TT? glenn: I don't mind drawing attention to this with a Note if people think that's useful - it's just editorial work. ... Superscript and subscript: I think I've already closed that. ... Ths issue is closed. nigel: marks it as closed on the CP. glenn: Change Proposal 16 - Style conditional ... I need to review this. I thought this change proposal had to do with an informal proposal I made where I described a condition attribute on some elements, where ... the value of the condition attribute is an expression in a simple expression language, whose evaluation, if false, would result in the element being excluded from presentation, otherwise ... treated as though there were no condition attribute present. This came out of the forcedDisplay discussion. ... I was going to have a simple expression language that at minimum looks like a list of functions in CSS where the names of those functions would be drawn from a list of ... predefined function list, e.g. "parameter(parameterName)" with some defined built-in parameters like "forced" so if you want to exclude some content based on this parameter ... being false then you would have a condition="parameter(forced)=true" that would be evaluated during the rendering and presentation process (specifically in the ISD generation process). ... I need to reread this CP and think about it - the proposal seems to have used something more like a media query expression. Sean wrote this originally I think. ... At minimum I want the condition mechanism to support forced semantics. Beyond that I don't have a real agenda. ... Other uses might include language. nigel: Another is where you may want to preferentially display images vs text under some circumstances. jdsmith: Are the only conditional inputs for this smooth animation and 4:3/16:9 video format? This looks like conditional styling. glenn: I'm talking about more general conditional expressions ... It's an interesting idea to consider feature support conditionality. ... CP17 Default styles ... This is mostly closed. It remains to be defined what a pixel means. issue-179? <trackbot> issue-179 -- Interpreting the pixel measure -- open <trackbot> [55]http://www.w3.org/AudioVideo/TT/tracker/issues/179 [55] http://www.w3.org/AudioVideo/TT/tracker/issues/179 pal: I think most people with CFF and SMPTE-TT think that when they author the document the video object has a certain number of pixels, and those are the ones they refer to. ... They literally relate to the encoded pixels, those in the AVC stream for example. glenn: Those pixels don't have a size at that point. Then they get mapped into a display pixel which does have a size. pal: And my 640x480 then gets mapped to a display pixel on my 1280x720 display. glenn: And at that stage the pixel has a concrete size. pal: That's my understanding. glenn: The rendered pixel is dependent on lots of other variables. Cyril: But that's not the coded pixel either. In AVC for instance you code a pixel, an RGB or YUV value or whatever. Then you stretch that according to the pixel aspect ratio. ... and then you may apply a clean aperture, to cut out some of the image, to make it the right multiple of macroblock size. So my guess it that people authoring TTML base it on ... the result of this process, taking the output of this decoding process, then applying the sample aspect ratio, then any cropping. glenn: I think this is an open question, it's not necessarily like that. For example in SD video you often have 720 pixels per line but you only display 704 pixels, so there's an 8 pixel buffer on either side ... to allow for overruns. So what we were describing is a 0-719 coordinate space whereas what you were describing was a 0-713 coordinate space. Cyril: Yes, if the video was cropped then you'd have some invisible text. pal: Some codecs have the ability to store a power of 2 number of pixels, internally. Then on the output it has internal cropping to put back the right value from the input. ... So is it literally the power of 2 internal array or the output of the decoder. Cyril: Yes, to give an example, in an MP4 file you have 3 sizes: ... 1. The size of the buffer that needs to be allocated. ... 2. The result of applying sample aspect ratio and cropping. ... 3. Applying a possible scale to the result, usually not done. ... So in an MP4 file you have the sample entry width and height, the clap and pasp width and height, and track header width and height. pal: That first one you mentioned is what comes out of the decoder. That's what I think of when I think of stored pixels. glenn: We want to pick one and go with it. pal: I'm happy that we're not talking about a display pixel. Cyril: I agree it's not the 3rd one. If you really want to use that you should use the same metrics to the TTML result. ... The only choice is 'output of decoder' or output of sample aspect ratio plus cropping. pal: I'd take out anything that's dependent on ISO BMFF. glenn: +1 pal: I think people have been using decoder output pixels with no further transformation. nigel: We need to find what's common across all formats. glenn: The source buffer is common. pal: I'd use that as a strawman. glenn: Previously we said 'pixel as defined in XSL-FO'. But the definition there is ambiguous - it can be device dependent or what CSS says. CSS says 96 pixels per inch, ... but it doesn't say which inch applies. They have some angle-based visual model including distance, to compute that. It's complicated. I think the CSS people gave up on it and made ... it an absolute dimension. They did that because it's what most people actually use in implementations. We have a similar scenario - most people use a different interpretation ... from what's in the spec, for whatever reason. pal: Yes, they see a video dimension size and go for that. glenn: I think on TTML1 we should add an erratum that defines pixel, and then use it normatively in TTML2. nigel: What's the proposal? glenn: We have a tentative proposal to make pixel a 'stored pixel' (pal's term) or 'coded sample' (Cyril's term). pal: Let me throw another one in the pot. glenn: I like 'coded sample' because it avoids circularity of definition. nigel: Can I clarify that we're talking about no tts:extent being specified on tt:tt? glenn: I have to do something different based on whether or not there's a related media object. ... Otherwise there's another definition. nigel: I'm worried that we end up with ambiguity between tt:tt@tts:extent and the related media object. glenn: That's a different problem, that we also need to deal with at the same time. nigel: This is exactly the same as the root temporal extent problem before - we need to relate the root spatial extent to an external display rectangle. Cyril: I've checked H264 AVC and HEVC and they both define a picture as an array of samples, and they both also define a message to carry sample aspect ratio. ... So the video may have one shape, and the TTML may define a different shape rectangle. pal: That's right, it's also something we should talk about. Cyril: [draws a picture] Decoder produces something with a Width and Height (within the decoder). Then you apply Sample Aspect Ratio (also within the decoder). ... and then, when you display something you may scale it, upsample/crop it etc. glenn: The array size is the same before and after applying sample aspect ratio? Cyril: No, the height is the same but the width may change. ... If you author using the W and H, and there's anamorphic conversion going on then you may need to apply some positioning of the TTML extent onto the video. ... The 'coded samples' are the ones before scaling with the sample aspect ratio. ... I don't know what to call the samples after scaling with the sample aspect ratio. 'scaled sample'? courtney: I'd call them 'square pixels' Cyril: that's not how AVC calls them, though it may make sense. pal: As a strawman can we use 'coded sample'? ... the anamorphic 'pixels' courtney: I think it makes sense to use the coded samples from the file because then the video in the file and the dimensions of the captions are consistent with one another, using the same metrics. pal: my proposal is use the term 'coded sample', get feedback on that as an errata. glenn: I'll probably refer to some MPEG document for the definition of 'coded sample'. I'm going to say that a TTML pixel is a 'coded sample'. Cyril: It would be good to provide examples. glenn: one interesting scenario is that tts:extent doesn't match the coded sample size of the related video object. Another is where they do match. ... The third is where there's no tts:extent, but there is a new ttp:aspectRatio property. pal: IMSC says either do 'matching pixel aspect ratio' or 'define aspect ratio' but not both. Cyril: I think we agree but I want to check: If I take an anamorphic video, before capture by the camera an object has a particular shape. Then after capture it's 'squished' to be thinner. ... then after decoding it gets restretched to its original shape. And what's stored, from the perspective of the coding specification, is the squished shape. ... So 'square pixels' is something that depends on your perspective. courtney: So where it the pixel aspect ratio square? Cyril: It's after 'unsquishing'. group agrees terminology pal: In IMSC 1, I will make a revision based on this erratum. It should probably say that the goal is never to have to use tts:extent and always create resolution independent subtitles. Cyril: Can I check that there's no impact caused by interlaced and progressive video? glenn: We assume it's been deinterlaced. courtney: +1 nigel: In TTML2 how does this impact on viewport-related widths and heights? ... Do we need to be concerned about the aspect ratio of the related video there too? glenn: That's what I'm working on at the moment. Cyril: SVG lets you specify a viewbox and relate viewport coordinates to that too. glenn: I'm not sure if we need that too - maybe. ... That's all I need for CP17 Issue-210? <trackbot> Issue-210 -- The values for alpha in rgba() don't correspond to CSS3 Color definitions -- open <trackbot> [56]http://www.w3.org/AudioVideo/TT/tracker/issues/210 [56] http://www.w3.org/AudioVideo/TT/tracker/issues/210 glenn: cp17 - we allowed 0-255 alpha values but CSS3 defines a 0-1 scale. So there's an ambiguity if the value 1 is used. courtney: Are the types different? Are the expressions differentiable by the decimal point? glenn: In CSS3 you don't need the decimal if alpha value is 1. So you can't infer anything there. ... I think we just define the mapping into CSS, because then its well defined. nigel: I'd argue it's well defined already but just needs to be clarified. issue-225? <trackbot> issue-225 -- tts:fontSize as percentage of container dimensions -- open <trackbot> [57]http://www.w3.org/AudioVideo/TT/tracker/issues/225 [57] http://www.w3.org/AudioVideo/TT/tracker/issues/225 pal: TTML1 doesn't really say what you're supposed to do with pixelAspectRatio. glenn: That's right - it's used to define authorial intent. It doesn't say how that should be applied. ... But we may need to reference that in the new verbiage. ... I put that in originally because PNG has a chunk that allows pixel aspect ratio to be defined. pal: I think if we go down the path of coded samples then it might be good to make sure that pixelAspectRatio is set. nigel: Can we resolve this by removing vmin and vmax? pal: +1 issue-225: (f2f meeting) We agreed to remove vmin and vmax. <trackbot> Notes added to issue-225 tts:fontSize as percentage of container dimensions. nigel: CP25. At a minimum that comes down to adding a documentGroup identifier. glenn: I'm happy to do that. nigel: CP5? glenn: There are too many details to discuss that. It involves converting to ISD! So I have to define that mapping to satisfy that. Wrap-up nigel: Thanks everyone - we've covered a huge amount over two days, including: ... agreeing to publish WebVTT ... the MIME type extension ... Reviewing the IMSC 1 review comments and agreeing the CR exit criteria ... thinking about the feelings of our specs ... Considering the relationship between TTML and related video objects both spatially and temporally ... going through the TTML2 change proposals ... and we even had time to think about multimodal interaction! ... adjourns meeting /| s/s||/ s|/ Summary of Action Items [NEW] ACTION: cyril Draft a WG note explaining the differences and relationships between the various versions of TTML [recorded in [58]http://www.w3.org/2014/10/27-tt-minutes.html#action02] [NEW] ACTION: glenn to check if timeContainer explicitly has no semantics with timeBase smpte, markerMode discontinuous [recorded in [59]http://www.w3.org/2014/10/27-tt-minutes.html#action05] [NEW] ACTION: glenn to update point (1) of section 3.1 in ttml2 to refer to a new annex that defines new processorProfiles MIME type parameter [recorded in [60]http://www.w3.org/2014/10/27-tt-minutes.html#action01] [NEW] ACTION: nigel Make request to Philip and Silvia to change Living Standard to Editor's Draft. [recorded in [61]http://www.w3.org/2014/10/27-tt-minutes.html#action03] [NEW] ACTION: nigel Scribe notes on CR exit criteria for IMSC 1 based on meeting debate [recorded in [62]http://www.w3.org/2014/10/27-tt-minutes.html#action04] [End of minutes] __________________________________________________________ Minutes formatted by David Booth's [63]scribe.perl version 1.138 ([64]CVS log) $Date: 2014-10-29 01:38:26 $ [63] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm [64] http://dev.w3.org/cvsweb/2002/scribe/
Received on Wednesday, 29 October 2014 01:45:53 UTC