W3C home > Mailing lists > Public > public-tt@w3.org > January 2013

RE: application/ttml+xml charset and encoding

From: Michael Dolan <mdolan@newtbt.com>
Date: Thu, 17 Jan 2013 07:01:01 -0800
To: <public-tt@w3.org>
Message-ID: <007101cdf4c3$78b26650$6a1732f0$@newtbt.com>
Yes, there is a mapping, sometimes, kind of.  But it is sloppy and they are not the same thing.  Minimally, it confuses the allowable tokens.  Setting it to a proper charset value such as “unicode.1.1”  is not equivalent to a proper encoding value of “utf-8” or “utf-16”.

 

Anyway, if we decide to constrain the encoding, then we can effectively ignore it.

 

                Mike

 

From: Glenn Adams [mailto:glenn@skynav.com] 
Sent: Thursday, January 17, 2013 6:47 AM
To: Michael Dolan
Cc: public-tt@w3.org
Subject: Re: application/ttml+xml charset and encoding

 

the "charset" parameter of MIME types and the "encoding" parameter for the XML declaration are effectively (if not identically) synonymous; in general (but not always) they map to both a character repertoire (a character set) and an on the wire encoding of strings that employ that repertoire

On Wed, Jan 16, 2013 at 2:00 PM, Michael Dolan <mdolan@newtbt.com> wrote:

(per my AI and for discussion in tomorrow’s meeting)

 

TTML 1.0 defines an media type “application/ttml+xml” in Appendix C:

https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-type-registration 

We are in the process of submitting a media type registration to IANA, revised from what was published in TTML 1.0.

 

Both the parameter and encoding considerations sections of the registration refer to “application/xml” (section 3.2) defined in RFC 3023, “XML Media Types”:

http://www.rfc-editor.org/rfc/rfc3023.txt 

 

An optional charset parameter is defined.  The value of charset is entirely unconstrained.  RFC 3023 seems to mix charset (e.g. 8859-1) with encoding (e.g. utf-8) which adds a layer of confusion.

 

Character encoding requirements XML in general are in section 4.3.3 of the XML 1.0 spec (RFC 3023 cites XML 1.0):

http://www.w3.org/TR/REC-xml/#charencoding 

and optionally, the algorithm defined in the informative Appendix F:

http://www.w3.org/TR/REC-xml/#sec-guessing

 

There are a variety of scenarios for which the charset/encoding cannot be determined. So, in the end, there is no deterministic way to deduce the charset/encoding from the file alone.  The media type charset parameter or some other external signaling means is required.  Most file systems do not include this metadata.  This makes file exchange problematic.

 

RFC 3023 makes some specific comments and recommendations in this area:

 

Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity.

 

"utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively.  These charsets are preferred since they are supported by all conforming processors of [XML].

 

I recommend that TTWG follow the RFC 3023 recommendation and clarify that the “application/ttml-xml” media type be constrained to utf-8 and utf-16 encoding only. Given the mixing of semantics for “charset”, I recommend we remain silent on that optional parameter, since, with this constraint, explicit signaling is not required. The other encoding consideration of RFC 3023 still apply.

 

Regards,

 

                Mike

 

 

Michael A DOLAN

TBT, Inc.    PO Box 190

Del Mar, CA 92014

(m) 858-882-7497

 

 
Received on Thursday, 17 January 2013 15:01:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 17 January 2013 15:01:37 GMT