- From: Michael Dolan <mdolan@newtbt.com>
- Date: Thu, 17 Jan 2013 08:28:45 -0800
- To: <public-tt@w3.org>
- Message-ID: <00a101cdf4cf$ba33ffb0$2e9bff10$@newtbt.com>
This is a bit off-topic, especially in light of our decision today, but… Unfortunately, in addition to encoding values, 3023 also uses the charset values of “iso-2022-jp” and “8859-1”; and the default value of charset (for text/xml) is “us-ascii”, so the semantics of the field are quite clearly mixed up in 3023, and it is not just that the field name should have been “encoding”. Can a decoder arrive at the right answer most of the time with such an overloading of this field (which I think is your point)? Yes. Mike From: Glenn Adams [mailto:glenn@skynav.com] Sent: Thursday, January 17, 2013 8:04 AM To: Michael Dolan Cc: public-tt@w3.org Subject: Re: application/ttml+xml charset and encoding a similar equivalence is indicated in HTML5, see http://www.w3.org/TR/html5/single-page.html#attr-meta-http-equiv-content-type On Thu, Jan 17, 2013 at 7:47 AM, Glenn Adams <glenn@skynav.com> wrote: the "charset" parameter of MIME types and the "encoding" parameter for the XML declaration are effectively (if not identically) synonymous; in general (but not always) they map to both a character repertoire (a character set) and an on the wire encoding of strings that employ that repertoire On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote: (per my AI and for discussion in tomorrow’s meeting) TTML 1.0 defines an media type “application/ttml+xml” in Appendix C: https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-type-registration We are in the process of submitting a media type registration to IANA, revised from what was published in TTML 1.0. Both the parameter and encoding considerations sections of the registration refer to “application/xml” (section 3.2) defined in RFC 3023, “XML Media Types”: http://www.rfc-editor.org/rfc/rfc3023.txt An optional charset parameter is defined. The value of charset is entirely unconstrained. RFC 3023 seems to mix charset (e.g. 8859-1) with encoding (e.g. utf-8) which adds a layer of confusion. Character encoding requirements XML in general are in section 4.3.3 of the XML 1.0 spec (RFC 3023 cites XML 1.0): http://www.w3.org/TR/REC-xml/#charencoding and optionally, the algorithm defined in the informative Appendix F: http://www.w3.org/TR/REC-xml/#sec-guessing There are a variety of scenarios for which the charset/encoding cannot be determined. So, in the end, there is no deterministic way to deduce the charset/encoding from the file alone. The media type charset parameter or some other external signaling means is required. Most file systems do not include this metadata. This makes file exchange problematic. RFC 3023 makes some specific comments and recommendations in this area: Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity. "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively. These charsets are preferred since they are supported by all conforming processors of [XML]. I recommend that TTWG follow the RFC 3023 recommendation and clarify that the “application/ttml-xml” media type be constrained to utf-8 and utf-16 encoding only. Given the mixing of semantics for “charset”, I recommend we remain silent on that optional parameter, since, with this constraint, explicit signaling is not required. The other encoding consideration of RFC 3023 still apply. Regards, Mike Michael A DOLAN TBT, Inc. PO Box 190 Del Mar, CA 92014 (m) 858-882-7497
Received on Thursday, 17 January 2013 16:29:24 UTC