W3C home > Mailing lists > Public > public-tt@w3.org > January 2013

Re: application/ttml+xml charset and encoding

From: Glenn Adams <glenn@skynav.com>
Date: Thu, 17 Jan 2013 09:03:35 -0700
Message-ID: <CACQ=j+cQe9Uv9N7ch7iydWt+oXBuUpgvMc8Cb6dV1L4VoJME+A@mail.gmail.com>
To: Michael Dolan <mdolan@newtbt.com>
Cc: public-tt@w3.org
a similar equivalence is indicated in HTML5, see

http://www.w3.org/TR/html5/single-page.html#attr-meta-http-equiv-content-type

On Thu, Jan 17, 2013 at 7:47 AM, Glenn Adams <glenn@skynav.com> wrote:

> the "charset" parameter of MIME types and the "encoding" parameter for the
> XML declaration are effectively (if not identically) synonymous; in general
> (but not always) they map to both a character repertoire (a character set)
> and an on the wire encoding of strings that employ that repertoire
>
>
> On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote:
>
>> (per my AI and for discussion in tomorrow’s meeting)****
>>
>> ** **
>>
>> TTML 1.0 defines an media type “application/ttml+xml” in Appendix C:****
>>
>>
>> https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-type-registration
>> ****
>>
>> We are in the process of submitting a media type registration to IANA,
>> revised from what was published in TTML 1.0.****
>>
>> ** **
>>
>> Both the parameter and encoding considerations sections of the
>> registration refer to “application/xml” (section 3.2) defined in RFC 3023,
>> “XML Media Types”:****
>>
>> http://www.rfc-editor.org/rfc/rfc3023.txt ****
>>
>> ** **
>>
>> An optional charset parameter is defined.  The value of charset is
>> entirely unconstrained.  RFC 3023 seems to mix charset (e.g. 8859-1) with
>> encoding (e.g. utf-8) which adds a layer of confusion.****
>>
>> ** **
>>
>> Character encoding requirements XML in general are in section 4.3.3 of
>> the XML 1.0 spec (RFC 3023 cites XML 1.0):****
>>
>> http://www.w3.org/TR/REC-xml/#charencoding ****
>>
>> and optionally, the algorithm defined in the informative Appendix F:****
>>
>> http://www.w3.org/TR/REC-xml/#sec-guessing****
>>
>> ** **
>>
>> There are a variety of scenarios for which the charset/encoding cannot be
>> determined. So, in the end, there is no deterministic way to deduce the
>> charset/encoding from the file alone.  The media type charset parameter or
>> some other external signaling means is required.  Most file systems do not
>> include this metadata.  This makes file exchange problematic.****
>>
>> ** **
>>
>> RFC 3023 makes some specific comments and recommendations in this area:**
>> **
>>
>> ** **
>>
>> Although listed as an optional parameter, the use of the charset
>> parameter is STRONGLY RECOMMENDED, since this information can be used by
>> XML processors to determine authoritatively the charset of the XML MIME
>> entity.****
>>
>> ** **
>>
>> "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values,
>> representing the UTF-8 and UTF-16 charsets, respectively.  These charsets
>> are preferred since they are supported by all conforming processors of
>> [XML].****
>>
>> ** **
>>
>> I recommend that TTWG follow the RFC 3023 recommendation and clarify that
>> the “application/ttml-xml” media type be constrained to utf-8 and utf-16
>> encoding only. Given the mixing of semantics for “charset”, I recommend we
>> remain silent on that optional parameter, since, with this constraint,
>> explicit signaling is not required. The other encoding consideration of RFC
>> 3023 still apply.****
>>
>> ** **
>>
>> Regards,****
>>
>> ** **
>>
>>                 Mike****
>>
>> ** **
>>
>> ** **
>>
>> Michael A DOLAN****
>>
>> TBT, Inc.    PO Box 190****
>>
>> Del Mar, CA 92014****
>>
>> (m) 858-882-7497****
>>
>> ** **
>>
>
>
Received on Thursday, 17 January 2013 16:04:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 17 January 2013 16:04:26 GMT