W3C home > Mailing lists > Public > public-tt@w3.org > January 2013

RE: application/ttml+xml charset and encoding

From: Michael Dolan <mdolan@newtbt.com>
Date: Thu, 17 Jan 2013 08:28:45 -0800
To: <public-tt@w3.org>
Message-ID: <00a101cdf4cf$ba33ffb0$2e9bff10$@newtbt.com>
This is a bit off-topic, especially in light of our decision today, but…


Unfortunately, in addition to encoding values, 3023 also uses the charset values of “iso-2022-jp” and “8859-1”; and the default value of charset (for text/xml) is “us-ascii”, so the semantics of the field are quite clearly mixed up in 3023, and it is not just that the field name should have been “encoding”.


Can a decoder arrive at the right answer most of the time with such an overloading of this field (which I think is your point)?  Yes.




From: Glenn Adams [mailto:glenn@skynav.com] 
Sent: Thursday, January 17, 2013 8:04 AM
To: Michael Dolan
Cc: public-tt@w3.org
Subject: Re: application/ttml+xml charset and encoding


a similar equivalence is indicated in HTML5, see



On Thu, Jan 17, 2013 at 7:47 AM, Glenn Adams <glenn@skynav.com> wrote:

the "charset" parameter of MIME types and the "encoding" parameter for the XML declaration are effectively (if not identically) synonymous; in general (but not always) they map to both a character repertoire (a character set) and an on the wire encoding of strings that employ that repertoire


On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote:

(per my AI and for discussion in tomorrow’s meeting)


TTML 1.0 defines an media type “application/ttml+xml” in Appendix C:


We are in the process of submitting a media type registration to IANA, revised from what was published in TTML 1.0.


Both the parameter and encoding considerations sections of the registration refer to “application/xml” (section 3.2) defined in RFC 3023, “XML Media Types”:



An optional charset parameter is defined.  The value of charset is entirely unconstrained.  RFC 3023 seems to mix charset (e.g. 8859-1) with encoding (e.g. utf-8) which adds a layer of confusion.


Character encoding requirements XML in general are in section 4.3.3 of the XML 1.0 spec (RFC 3023 cites XML 1.0):


and optionally, the algorithm defined in the informative Appendix F:



There are a variety of scenarios for which the charset/encoding cannot be determined. So, in the end, there is no deterministic way to deduce the charset/encoding from the file alone.  The media type charset parameter or some other external signaling means is required.  Most file systems do not include this metadata.  This makes file exchange problematic.


RFC 3023 makes some specific comments and recommendations in this area:


Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity.


"utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively.  These charsets are preferred since they are supported by all conforming processors of [XML].


I recommend that TTWG follow the RFC 3023 recommendation and clarify that the “application/ttml-xml” media type be constrained to utf-8 and utf-16 encoding only. Given the mixing of semantics for “charset”, I recommend we remain silent on that optional parameter, since, with this constraint, explicit signaling is not required. The other encoding consideration of RFC 3023 still apply.







Michael A DOLAN

TBT, Inc.    PO Box 190

Del Mar, CA 92014

(m) 858-882-7497



Received on Thursday, 17 January 2013 16:29:24 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 5 October 2017 18:24:07 UTC