W3C home > Mailing lists > Public > public-tt@w3.org > November 2013

ISSUE-296 (xml:lang constraints in IMSC): Remove xml:lang placement restrictions from IMSC [IMSC]

From: Timed Text Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Thu, 07 Nov 2013 13:50:48 +0000
Message-Id: <E1VePyy-0000If-CA@shauna.w3.org>
To: public-tt@w3.org
ISSUE-296 (xml:lang constraints in IMSC): Remove xml:lang placement restrictions from IMSC [IMSC]


Raised by: Nigel Megitt
On product: IMSC

The EBU XML Subtitles group raises this concern re the constraints of placement of xml:lang within IMSC:

The use of xml:lang in IMSC is contrary to the accepted use (and recommended best practise) of this attribute from the xml standard [1]. The IMSC document [2] states in section 4.4 Language that “All instances of the xml:lang attribute within a subtitle document SHALL have identical values.”
[1] http://www.w3.org/TR/REC-xml/
[2] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml-ww-profiles/ttml-ww-profiles.html#language

In the W3C document “Best Practices for XML Internationalization” [3] published by the W3C Working Group on 13 February 2008 (over 5 years ago and without subsequent amendment or correction), section 3 “When Authoring XML Content” states that

“Authors of XML content should consider the following best practices:”.

Best Practise: Specifying the language of content             Use xml:lang (or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes. 

[3] http://www.w3.org/TR/xml-i18n-bp/#AuthoringTime
Clearly the best practise is to use xml:lang to correctly identify the language of any element within a document when it differs from the surrounding elements.
This best practise advice is further reinforced by another document from the W3C: xml:lang in XML document schemas [4] in the section “When to use xml:lang”.
Content directly associated with the XML document (either contained within the document directly or considered part of the document when it is processed or rendered) should use the xml:lang attribute to indicate the language of that content. xml:lang should be reserved for content authors to directly label any natural language content they may have.
[4] http://www.w3.org/International/questions/qa-when-xmllang

xml:lang is defined by XML 1.0 as a common attribute that can be used to indicate the language of any element's contents. This includes any human readable text, as well as other content (such as embedded objects like images or sound files) contained by the element in which it appears. The xml:lang value applies to any sub-elements contained by the element. It also applies to attribute values associated with the element and sub-elements (though using natural language in attributes is not best practice). The value of the xml:lang attribute is a language tag defined by BCP 47.
We propose that the IMSC document should be corrected to accurately reflect the established guidelines.

There is additionally a specific use case for permitting multiple languages to be indicated within content: this is to permit the use of distributed subtitle document for alternative purposes such as processing by text to speech engines to generate 'spoken subtitles', in which language-appropriate speech synthesis models may be required depending on content.

This is related to ISSUE-295.
Received on Thursday, 7 November 2013 13:50:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:43:24 UTC