- From: Glenn Adams <glenn@skynav.com>
- Date: Sun, 20 Jan 2013 11:50:07 -0700
- To: Michael Dolan <mdolan@newtbt.com>
- Cc: public-tt@w3.org
- Message-ID: <CACQ=j+cxE5EMTiLJevTszewJ_yoT4q659WYonVSx66H_K1VfGQ@mail.gmail.com>
On Thu, Jan 17, 2013 at 9:28 AM, Michael Dolan <mdolan@newtbt.com> wrote: > This is a bit off-topic, especially in light of our decision today, but…** > ** > > ** ** > > Unfortunately, in addition to encoding values, 3023 also uses the charset > values of “iso-2022-jp” and “8859-1”; and the default value of charset (for > text/xml) is “us-ascii”, so the semantics of the field are quite clearly > mixed up in 3023, and it is not just that the field name should have been > “encoding”. > 8859-1 and us-ascii are labels for both character sets and encodings. iso-2022-jp is a label for an encoding that supports multiple character sets. Historically, there has never been a careful distinction between character set (repertoire) and encoding. So I think you are attempting to ask for something (clarity of usage) that has never been there. > **** > > ** ** > > Can a decoder arrive at the right answer most of the time with such an > overloading of this field (which I think is your point)? Yes. > I'm not sure what you are asking. If charset is set to any of these three values { 8859-1, iso-2022-jp, us-ascii }, there is no ambiguity about what is to be decoded. If the author or transport misspecified the charset, then that is a different problem. > **** > > ** ** > > Mike**** > > ** ** > > *From:* Glenn Adams [mailto:glenn@skynav.com] > *Sent:* Thursday, January 17, 2013 8:04 AM > > *To:* Michael Dolan > *Cc:* public-tt@w3.org > *Subject:* Re: application/ttml+xml charset and encoding**** > > ** ** > > a similar equivalence is indicated in HTML5, see**** > > ** ** > > > http://www.w3.org/TR/html5/single-page.html#attr-meta-http-equiv-content-type > **** > > On Thu, Jan 17, 2013 at 7:47 AM, Glenn Adams <glenn@skynav.com> wrote:**** > > the "charset" parameter of MIME types and the "encoding" parameter for the > XML declaration are effectively (if not identically) synonymous; in general > (but not always) they map to both a character repertoire (a character set) > and an on the wire encoding of strings that employ that repertoire**** > > ** ** > > On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote:* > *** > > (per my AI and for discussion in tomorrow’s meeting)**** > > **** > > TTML 1.0 defines an media type “application/ttml+xml” in Appendix C:**** > > > https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-type-registration > **** > > We are in the process of submitting a media type registration to IANA, > revised from what was published in TTML 1.0.**** > > **** > > Both the parameter and encoding considerations sections of the > registration refer to “application/xml” (section 3.2) defined in RFC 3023, > “XML Media Types”:**** > > http://www.rfc-editor.org/rfc/rfc3023.txt **** > > **** > > An optional charset parameter is defined. The value of charset is > entirely unconstrained. RFC 3023 seems to mix charset (e.g. 8859-1) with > encoding (e.g. utf-8) which adds a layer of confusion.**** > > **** > > Character encoding requirements XML in general are in section 4.3.3 of the > XML 1.0 spec (RFC 3023 cites XML 1.0):**** > > http://www.w3.org/TR/REC-xml/#charencoding **** > > and optionally, the algorithm defined in the informative Appendix F:**** > > http://www.w3.org/TR/REC-xml/#sec-guessing**** > > **** > > There are a variety of scenarios for which the charset/encoding cannot be > determined. So, in the end, there is no deterministic way to deduce the > charset/encoding from the file alone. The media type charset parameter or > some other external signaling means is required. Most file systems do not > include this metadata. This makes file exchange problematic.**** > > **** > > RFC 3023 makes some specific comments and recommendations in this area:*** > * > > **** > > Although listed as an optional parameter, the use of the charset parameter > is STRONGLY RECOMMENDED, since this information can be used by XML > processors to determine authoritatively the charset of the XML MIME entity. > **** > > **** > > "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, > representing the UTF-8 and UTF-16 charsets, respectively. These charsets > are preferred since they are supported by all conforming processors of > [XML].**** > > **** > > I recommend that TTWG follow the RFC 3023 recommendation and clarify that > the “application/ttml-xml” media type be constrained to utf-8 and utf-16 > encoding only. Given the mixing of semantics for “charset”, I recommend we > remain silent on that optional parameter, since, with this constraint, > explicit signaling is not required. The other encoding consideration of RFC > 3023 still apply.**** > > **** > > Regards,**** > > **** > > Mike**** > > **** > > **** > > Michael A DOLAN**** > > TBT, Inc. PO Box 190**** > > Del Mar, CA 92014**** > > (m) 858-882-7497**** > > **** > > ** ** > > ** ** >
Received on Sunday, 20 January 2013 18:50:56 UTC