application/ttml+xml charset and encoding

(per my AI and for discussion in tomorrow's meeting)

 

TTML 1.0 defines an media type "application/ttml+xml" in Appendix C:

https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10/spec/ttaf1-dfxp.html#media-t
ype-registration 

We are in the process of submitting a media type registration to IANA,
revised from what was published in TTML 1.0.

 

Both the parameter and encoding considerations sections of the registration
refer to "application/xml" (section 3.2) defined in RFC 3023, "XML Media
Types":

http://www.rfc-editor.org/rfc/rfc3023.txt 

 

An optional charset parameter is defined.  The value of charset is entirely
unconstrained.  RFC 3023 seems to mix charset (e.g. 8859-1) with encoding
(e.g. utf-8) which adds a layer of confusion.

 

Character encoding requirements XML in general are in section 4.3.3 of the
XML 1.0 spec (RFC 3023 cites XML 1.0):

http://www.w3.org/TR/REC-xml/#charencoding 

and optionally, the algorithm defined in the informative Appendix F:

http://www.w3.org/TR/REC-xml/#sec-guessing

 

There are a variety of scenarios for which the charset/encoding cannot be
determined. So, in the end, there is no deterministic way to deduce the
charset/encoding from the file alone.  The media type charset parameter or
some other external signaling means is required.  Most file systems do not
include this metadata.  This makes file exchange problematic.

 

RFC 3023 makes some specific comments and recommendations in this area:

 

Although listed as an optional parameter, the use of the charset parameter
is STRONGLY RECOMMENDED, since this information can be used by XML
processors to determine authoritatively the charset of the XML MIME entity.

 

"utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values,
representing the UTF-8 and UTF-16 charsets, respectively.  These charsets
are preferred since they are supported by all conforming processors of
[XML].

 

I recommend that TTWG follow the RFC 3023 recommendation and clarify that
the "application/ttml-xml" media type be constrained to utf-8 and utf-16
encoding only. Given the mixing of semantics for "charset", I recommend we
remain silent on that optional parameter, since, with this constraint,
explicit signaling is not required. The other encoding consideration of RFC
3023 still apply.

 

Regards,

 

                Mike

 

 

Michael A DOLAN

TBT, Inc.    PO Box 190

Del Mar, CA 92014

(m) 858-882-7497

 

Received on Wednesday, 16 January 2013 21:00:49 UTC