- From: Michael Dolan <mdolan@newtbt.com>
- Date: Wed, 16 Jan 2013 13:00:14 -0800
- To: <public-tt@w3.org>
- Message-ID: <015e01cdf42c$7cfbfd80$76f3f880$@newtbt.com>
(per my AI and for discussion in tomorrow's meeting) TTML 1.0 defines an media type "application/ttml+xml" in Appendix C: https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-t ype-registration We are in the process of submitting a media type registration to IANA, revised from what was published in TTML 1.0. Both the parameter and encoding considerations sections of the registration refer to "application/xml" (section 3.2) defined in RFC 3023, "XML Media Types": http://www.rfc-editor.org/rfc/rfc3023.txt An optional charset parameter is defined. The value of charset is entirely unconstrained. RFC 3023 seems to mix charset (e.g. 8859-1) with encoding (e.g. utf-8) which adds a layer of confusion. Character encoding requirements XML in general are in section 4.3.3 of the XML 1.0 spec (RFC 3023 cites XML 1.0): http://www.w3.org/TR/REC-xml/#charencoding and optionally, the algorithm defined in the informative Appendix F: http://www.w3.org/TR/REC-xml/#sec-guessing There are a variety of scenarios for which the charset/encoding cannot be determined. So, in the end, there is no deterministic way to deduce the charset/encoding from the file alone. The media type charset parameter or some other external signaling means is required. Most file systems do not include this metadata. This makes file exchange problematic. RFC 3023 makes some specific comments and recommendations in this area: Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity. "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively. These charsets are preferred since they are supported by all conforming processors of [XML]. I recommend that TTWG follow the RFC 3023 recommendation and clarify that the "application/ttml-xml" media type be constrained to utf-8 and utf-16 encoding only. Given the mixing of semantics for "charset", I recommend we remain silent on that optional parameter, since, with this constraint, explicit signaling is not required. The other encoding consideration of RFC 3023 still apply. Regards, Mike Michael A DOLAN TBT, Inc. PO Box 190 Del Mar, CA 92014 (m) 858-882-7497
Received on Wednesday, 16 January 2013 21:00:49 UTC