W3C home > Mailing lists > Public > public-tt@w3.org > December 2008

RE: new issue? dfxp and language selection

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Fri, 5 Dec 2008 12:05:07 +0000
To: John Birch <john.birch@screen.subtitling.com>, Daniel Weck <daniel.weck@gmail.com>
CC: "Glenn A. Adams" <gadams@xfsi.com>, Public TTWG List <public-tt@w3.org>
Message-ID: <90EEC9D914694641A8358AA190DACB3D2FCE9A1E04@EA-EXMSG-C334.europe.corp.microsoft.com>

If we wish to target content at different audiences, in my mind the best vehicle DFXP has for that is the region element, regions can be switched on and off by tts:display. The application of a user stylesheet to do such switching seems in keeping with accepted HTML usage. On the subject of HTML, very rarely do I see a multilingual site attempt to combine the languages in one page, that would be a nightmare to maintain. If we are thinking of embedding DFXP in HTML, I see no reason why separate files would not be appropriate.

Note that John's example below is incorrect in a number of ways; a more correct example is given below which uses no new features beyond DFXP, uses xml:lang solely to indicate natural language and achieves what I believe he is attempting. Moreover this approach would be more readily able to capture the typical typographical approaches used in the various territories.

I don't have ccPlayer to hand, but based on the description I have heard, my expectation is that this example should actually work in that player too, as they ignore regions, and display depending on xml:lang however I would not wish to encourage such non standard behaviour.

    xml:lang =""
    xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" >
        <ttm:title>Multi lingual example</ttm:title>
            This file contains four complete language examples.
            users would  filter appropriately by switching on the relevant region.
            Sound effects can be switched independantly of language.
        <style id="s1">...</style>
        <style id="s2">...</style>
            <region xml:id="soundEffect"
            <region xml:id="frenchLanguageSubtitles"
            <region xml:id="englishLanguageSubtitles"
            <region xml:id="americanLanguageSubtitles"
            <region xml:id="québécquoisLanguageSubtitles"
    <body timeContainer ="par" >
        <div timeContainer="seq" xml:lang ="fr-fr" region="frenchLanguageSubtitles" ttm:title="Titre en français">
                <p ttm:role="sound"  region="soundEffect" dur="1s">FANFARE!</p>
                <p dur="1s">Ce texte est en français.</p>
        <div timeContainer="seq" xml:lang ="fr-ca" region="québécquoisLanguageSubtitles"  ttm:title="Titre en québécquois">
                <p ttm:role="sound"  region="soundEffect" dur="1s">FANFARE!</p>
                <p dur="1s">Ce texte est en québécquois.</p>
        <div timeContainer ="seq" xml:lang ="en-uk"  region="englishLanguageSubtitles"  ttm:title="Title in English">
                <p ttm:role="sound" region="soundEffect" dur="1s">TYRE SCREECH!</p>
                <p dur="1s">Quick! Put the body in the boot!</p>
        <div timeContainer ="seq" xml:lang ="en-us"  region="americanLanguageSubtitles"  ttm:title="Title in English">
                <p ttm:role="sound" region="soundEffect" dur="1s">TYRE SCREECH!</p>
                <p dur="1s">Quick! Put the body in the trunk!</p>

Sean Hayes
Media Accessibility Strategist
Accessibility Business Unit

Office:  +44 118 909 5867,
Mobile: +44 7875 091385

-----Original Message-----
From: John Birch [mailto:john.birch@screen.subtitling.com]
Sent: 04 December 2008 16:46
To: Daniel Weck
Cc: Sean Hayes; Glenn A. Adams; Public TTWG List
Subject: RE: new issue? dfxp and language selection

It's a good explanation, but I fear I'm not quite getting my point across.

Two selection scenarios are common in subtitling.

A) target audience language selection. Probably at a level immediately below body level between multiple 'functionally equivalent' yet language differentiated divs.
B) Removal of inline content because of user preference. For example, in a movie with hard-of-hearing subtitles, a user may wish to turn off the subtitles pertaining to sound effects, but retain those relating to speech. Note: this can be done with current spec using ttm:role attribute.

I agree that DFXP should include a marker that makes an explicit statement about intent.
E.g. This content is intended for french speakers.
Or perhaps go further... E.g. This content is intended for french speakers who are also deaf (although this can be finessed using the role attribute).

I agree with Sean that I think that the same type of selection that might be achieved by language matching and switch constructs can be achieved by processing - PROVIDED that sufficient markup exists in the document to identify content with sufficient granularity.

So my suggestion would be

   <sequence ttm:lang="fr" title="Titre en français">
     <p ttm:role="sound">FANFARE!</p>
     <p>Ce texte est en français.</p>
     <p ttm:lang="fr-CA">Ce texte est en québécquois.</p>

   <sequence ttm:lang="en" title="Title in English">
     <p ttm:role="sound">TYRE SCREECH!</p>
     <p>Quick! Put the body in the boot!</p>
     <p ttm:lang="en-US">Quick! Put the body in the trunk!</p>

BUT what is interesting here is that the two text strings (excluding the sound effect representation) ARE equivalents.
What is certain is that BOTH should NOT be displayed. Perhaps some form of alt. markup is required :-)

BTW Of course I'm assuming fr = fr-fr and en = en-en :-)

Best regards,


John Birch | Screen Subtitling Systems Ltd | Strategic Partnerships Manager
Main Line : +44 (0)1473 831700 | Ext : 270 | Office :
Mobile: +44 (0)7919 558380 | Fax: +44 (0)1473 830078
john.birch@screen.subtitling.com | www.screen.subtitling.com
The Old Rectory, Claydon Curch Lane, Claydon,Ipswich,IP6 0EQ,United Kingdom

See us at Broadcast Video Expo - February 17th - 19th 2009, Earls Court 2, London, Stand number K56

Before Printing, think about the environment

-----Original Message-----
From: Daniel Weck [mailto:daniel.weck@gmail.com]
Sent: 04 December 2008 16:09
To: John Birch
Cc: Hayes Sean; Glenn A. Adams; Public TTWG List
Subject: Re: new issue? dfxp and language selection

On 4 Dec 2008, at 15:07, John Birch wrote:
> JB>> Generic XML can be processed using internal content and external
> criteria. I personally view switches as being a way of pre-coding
> common processing operations - but I view it as ~dangerous~ to only
> allow those pre-coded choices to be made in order to remain
> 'conformant'.

I see what you mean: you see it as some kind of "anti-pattern", in reference to software development :)

Now, let's consider this fictitious, yet relevant sample:

<text xml:lang="en">
   <sequence xml:lang="fr" title="Titre en français">
     <p>Texte en français.</p>
     <p xml:lang="fr-CA">Texte en québécquois.</p>
     <p xml:lang="en-GB">Text in British English.</p>
   <p>Text in (unspecified) English.</p> </text>

If "xml:lang" was to be processed by user-agents as a content selection criteria, there would be a number of issues:

1) Clearly, content selection wasn't the original intent of the author. It is obvious that here, the "xml:lang" attributes decorate the elements to merely indicate the locale of the content. With the above XML snippet, XPath and the lang() function can be used, for example, pre-process (e.g. XSLT transform) or to dynamically alter the content (e.g. "highlight any English text in bright yellow"). This kind of processing made by the user-agent seems perfectly reasonable.
On the other hand, my instinctive subjective assumption is that content pruning is not the desired goal. To remove this ambiguity, the TT/DFXP distribution format for captions should provide more than just a hint, it should clearly specify the intent (IMHO). This would promote re-using content across multiple processors.

2) The "xml:lang" attribute applies to an entire XML fragment, until it is overridden. In a content selection scenario, this nesting ability prompts a number of questions. For example, what happens if the user-agent locale is set to "fr": should the top-level "text"
element be totally ignored/pruned, or should the "sequence" be processed and the following "p" ignored ? My personal systematic / scientific mind is in favor of the former, but I know authors who would "feel" that the latter is right.

3) What about more complex selection criteria ? Let's say that I want to mark a piece of text as "suitable for all flavors of French expect
Canadian": using a (fictitious) 'matchLanguage' attribute, I could write matchLanguage="fr AND NOT fr-CA". Note: the coma-separated values in the SMIL systemLanguage attribute represent a OR boolean logic, so there are limitations in the selection model.

4) What about a fallback logic, so that if no suitable language is matched, then a specific XML fragment is enabled ? In SMIL, the 'switch' offers this mechanism, which enriches the default selection model based on the combinatory attribute value.

I feel that a proper "content control" mechanism would address these concerns. Otherwise, I am not convinced that TT/DFXP will sufficiently eliminate ambiguities that user-agent implementors and content authors (or developers of production tools) will face, and I would recommend to clearly state that xml:lang is not designed for content selection, and that to be reflected in user-agent conformance guidelines.

> JB>> If we did not have existent implementations then I would be
> proposing two language attributes. One to allow a language specific
> instance of a DFXP document (i.e. the true xml:lang sense) and another
> - perhaps ttm:lang, to define the language used in sections of the
> document.

The "xml:lang" attribute from XML 1.0 and 1.1 can do both scenarios you mention. "xml:lang" is not meant to be limited to the document instance as far as I know. The "lang" versus "xml:lang" mess has been fixed in XHTML 1.1 IIRC, isn't that a good trend to follow ?

Regards, Dan

This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
Received on Friday, 5 December 2008 12:06:45 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 5 October 2017 18:24:03 UTC