Re: MIME type for XML
Terry Allen <email@example.com> writes:
>RFC 1874 on SGML Media Types defines both text and application
>for SGML, with some language that may or may not be relevant;
>the main idea appears to be to provide fallback to text/plain
Got it -- thanks Terry. The fact that the RFC states preference for
US-ASCII was kinda my point. Since the default charset for XML is not
US-ASCII, I don't think the assumption in RFC 1874 is valid or useful in
XML. And yes, I think the RFC should be changed, not XML. From the XML 1.0
This specification depends on the international standard ISO/IEC 10646
and the technically identical Unicode Standard, Version 2.0, which
define the encodings and meanings of the characters which make up
XML text data.
Relevant quotes from RFC 1874 follow. Section 2.1 describes text/sgml as
being employed when software is meant to be human-readable:
MIME type name: Text
MIME subtype name: SGML
Required parameters: none
Optional parameters: charset, SGML-bctf, SGML-boot
Encoding considerations: may be encoded
Security considerations: see section 4 below
Published specification: ISO 8879:1986
Person and email address to contact for further information:
E. Levinson <ELevinson@Accurate.com>
The Text/SGML media-type can be employed when the contents of the
SGML entity is intended to be read by a human and is in a readily
comprehensible form. That is the content can be easily discerned by
someone without SGML display software. Each record in the SGML
entity, delimited by record start (RS) and record end (RE) codes,
must correspond to a line in the Text/SGML body part.
SGML entities that do not meet the above requirements should use the
A document in UCS-4 Arabic is certainly intended to be read by a human. The
problem doesn't seem to be use of RS and RE per se, it's their
transformation into multibyte Unicode equivalents.
[...describing the 'charset' parameter...]
charset The charset parameter for Text/SGML is defined in
[RFC-1521], the valid values and their meaning are
registered by the Internet Assigned Numbers
Authority (IANA) [RFC-1590]. The default charset
value for all Text content-types is "us-ascii"
The charset parameter is provided to permit non-
SGML capable systems to provide reasonable
behavior when Text/SGML defaults to Text/Plain.
SGML capable systems will use the SGML-bctf param-
What needs changing is the definition of MIME 'text/*' from 646 to 10646,
not a UCS-4 document instance into an 'application/*' MIME type. Otherwise,
MIME is inextricably bound to US-ASCII, which seems a mistake. I'm sure
someone more qualified than I has argued this out in the MIME/SGML WGs.
XML may simply be among the first applications requiring this type of i18n
modification to what are gradually becoming outdated specs.
Murray Altheim, Program Manager
Spyglass, Inc., Cambridge, Massachusetts
"Give a monkey the tools and he'll eventually build a typewriter."