Re: new issue? dfxp and language selection

On 4 Dec 2008, at 15:07, John Birch wrote:
> JB>> Generic XML can be processed using internal content and external
> criteria. I personally view switches as being a way of pre-coding  
> common
> processing operations - but I view it as ~dangerous~ to only allow  
> those
> pre-coded choices to be made in order to remain 'conformant'.

I see what you mean: you see it as some kind of "anti-pattern", in  
reference to software development :)

Now, let's consider this fictitious, yet relevant sample:

<text xml:lang="en">
   <sequence xml:lang="fr" title="Titre en français">
     <p>Texte en français.</p>
     <p xml:lang="fr-CA">Texte en québécquois.</p>
     <p xml:lang="en-GB">Text in British English.</p>
   </sequence>
   <p>Text in (unspecified) English.</p>
</text>

If "xml:lang" was to be processed by user-agents as a content  
selection criteria, there would be a number of issues:

1) Clearly, content selection wasn't the original intent of the  
author. It is obvious that here, the "xml:lang" attributes decorate  
the elements to merely indicate the locale of the content. With the  
above XML snippet, XPath and the lang() function can be used, for  
example, pre-process (e.g. XSLT transform) or to dynamically alter the  
content (e.g. "highlight any English text in bright yellow"). This  
kind of processing made by the user-agent seems perfectly reasonable.  
On the other hand, my instinctive subjective assumption is that  
content pruning is not the desired goal. To remove this ambiguity, the  
TT/DFXP distribution format for captions should provide more than just  
a hint, it should clearly specify the intent (IMHO). This would  
promote re-using content across multiple processors.

2) The "xml:lang" attribute applies to an entire XML fragment, until  
it is overridden. In a content selection scenario, this nesting  
ability prompts a number of questions. For example, what happens if  
the user-agent locale is set to "fr": should the top-level "text"  
element be totally ignored/pruned, or should the "sequence" be  
processed and the following "p" ignored ? My personal systematic /  
scientific mind is in favor of the former, but I know authors who  
would "feel" that the latter is right.

3) What about more complex selection criteria ? Let's say that I want  
to mark a piece of text as "suitable for all flavors of French expect  
Canadian": using a (fictitious) 'matchLanguage' attribute, I could  
write matchLanguage="fr AND NOT fr-CA". Note: the coma-separated  
values in the SMIL systemLanguage attribute represent a OR boolean  
logic, so there are limitations in the selection model.

4) What about a fallback logic, so that if no suitable language is  
matched, then a specific XML fragment is enabled ? In SMIL, the  
'switch' offers this mechanism, which enriches the default selection  
model based on the combinatory attribute value.

I feel that a proper "content control" mechanism would address these  
concerns. Otherwise, I am not convinced that TT/DFXP will sufficiently  
eliminate ambiguities that user-agent implementors and content authors  
(or developers of production tools) will face, and I would recommend  
to clearly state that xml:lang is not designed for content selection,  
and that to be reflected in user-agent conformance guidelines.

> JB>> If we did not have existent implementations then I would be
> proposing two language attributes. One to allow a language specific
> instance of a DFXP document (i.e. the true xml:lang sense) and  
> another -
> perhaps ttm:lang, to define the language used in sections of the
> document.

The "xml:lang" attribute from XML 1.0 and 1.1 can do both scenarios  
you mention. "xml:lang" is not meant to be limited to the document  
instance as far as I know. The "lang" versus "xml:lang" mess has been  
fixed in XHTML 1.1 IIRC, isn't that a good trend to follow ?

Regards, Dan

Received on Thursday, 4 December 2008 16:09:15 UTC