Re: application/ttml+xml charset and encoding

the "charset" parameter of MIME types and the "encoding" parameter for the
XML declaration are effectively (if not identically) synonymous; in general
(but not always) they map to both a character repertoire (a character set)
and an on the wire encoding of strings that employ that repertoire

On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote:

> (per my AI and for discussion in tomorrow’s meeting)****
>
> ** **
>
> TTML 1.0 defines an media type “application/ttml+xml” in Appendix C:****
>
>
> https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-type-registration
> ****
>
> We are in the process of submitting a media type registration to IANA,
> revised from what was published in TTML 1.0.****
>
> ** **
>
> Both the parameter and encoding considerations sections of the
> registration refer to “application/xml” (section 3.2) defined in RFC 3023,
> “XML Media Types”:****
>
> http://www.rfc-editor.org/rfc/rfc3023.txt ****
>
> ** **
>
> An optional charset parameter is defined.  The value of charset is
> entirely unconstrained.  RFC 3023 seems to mix charset (e.g. 8859-1) with
> encoding (e.g. utf-8) which adds a layer of confusion.****
>
> ** **
>
> Character encoding requirements XML in general are in section 4.3.3 of the
> XML 1.0 spec (RFC 3023 cites XML 1.0):****
>
> http://www.w3.org/TR/REC-xml/#charencoding ****
>
> and optionally, the algorithm defined in the informative Appendix F:****
>
> http://www.w3.org/TR/REC-xml/#sec-guessing****
>
> ** **
>
> There are a variety of scenarios for which the charset/encoding cannot be
> determined. So, in the end, there is no deterministic way to deduce the
> charset/encoding from the file alone.  The media type charset parameter or
> some other external signaling means is required.  Most file systems do not
> include this metadata.  This makes file exchange problematic.****
>
> ** **
>
> RFC 3023 makes some specific comments and recommendations in this area:***
> *
>
> ** **
>
> Although listed as an optional parameter, the use of the charset parameter
> is STRONGLY RECOMMENDED, since this information can be used by XML
> processors to determine authoritatively the charset of the XML MIME entity.
> ****
>
> ** **
>
> "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values,
> representing the UTF-8 and UTF-16 charsets, respectively.  These charsets
> are preferred since they are supported by all conforming processors of
> [XML].****
>
> ** **
>
> I recommend that TTWG follow the RFC 3023 recommendation and clarify that
> the “application/ttml-xml” media type be constrained to utf-8 and utf-16
> encoding only. Given the mixing of semantics for “charset”, I recommend we
> remain silent on that optional parameter, since, with this constraint,
> explicit signaling is not required. The other encoding consideration of RFC
> 3023 still apply.****
>
> ** **
>
> Regards,****
>
> ** **
>
>                 Mike****
>
> ** **
>
> ** **
>
> Michael A DOLAN****
>
> TBT, Inc.    PO Box 190****
>
> Del Mar, CA 92014****
>
> (m) 858-882-7497****
>
> ** **
>

Received on Thursday, 17 January 2013 14:48:17 UTC