- From: Murray Altheim <murray@spyglass.com>
- Date: Mon, 2 Dec 1996 20:50:02 -0400
- To: Terry Allen <tallen@fsc.fujitsu.com>
- Cc: crm@ebt.com, w3c-sgml-wg@w3.org
Terry Allen <tallen@fsc.fujitsu.com> writes: >RFC 1874 on SGML Media Types defines both text and application >for SGML, with some language that may or may not be relevant; >the main idea appears to be to provide fallback to text/plain >from text/sgml. > >ftp://ds.internic.net/rfc/rfc1874.txt Got it -- thanks Terry. The fact that the RFC states preference for US-ASCII was kinda my point. Since the default charset for XML is not US-ASCII, I don't think the assumption in RFC 1874 is valid or useful in XML. And yes, I think the RFC should be changed, not XML. From the XML 1.0 spec: This specification depends on the international standard ISO/IEC 10646 and the technically identical Unicode Standard, Version 2.0, which define the encodings and meanings of the characters which make up XML text data. Relevant quotes from RFC 1874 follow. Section 2.1 describes text/sgml as being employed when software is meant to be human-readable: 2.1. Text/SGML MIME type name: Text MIME subtype name: SGML Required parameters: none Optional parameters: charset, SGML-bctf, SGML-boot Encoding considerations: may be encoded Security considerations: see section 4 below Published specification: ISO 8879:1986 Person and email address to contact for further information: E. Levinson <ELevinson@Accurate.com> The Text/SGML media-type can be employed when the contents of the SGML entity is intended to be read by a human and is in a readily comprehensible form. That is the content can be easily discerned by someone without SGML display software. Each record in the SGML entity, delimited by record start (RS) and record end (RE) codes, must correspond to a line in the Text/SGML body part. SGML entities that do not meet the above requirements should use the Application/SGML media-type. A document in UCS-4 Arabic is certainly intended to be read by a human. The problem doesn't seem to be use of RS and RE per se, it's their transformation into multibyte Unicode equivalents. [...describing the 'charset' parameter...] charset The charset parameter for Text/SGML is defined in [RFC-1521], the valid values and their meaning are registered by the Internet Assigned Numbers Authority (IANA) [RFC-1590]. The default charset value for all Text content-types is "us-ascii" [RFC-1521]. The charset parameter is provided to permit non- SGML capable systems to provide reasonable behavior when Text/SGML defaults to Text/Plain. SGML capable systems will use the SGML-bctf param- eter. What needs changing is the definition of MIME 'text/*' from 646 to 10646, not a UCS-4 document instance into an 'application/*' MIME type. Otherwise, MIME is inextricably bound to US-ASCII, which seems a mistake. I'm sure someone more qualified than I has argued this out in the MIME/SGML WGs. XML may simply be among the first applications requiring this type of i18n modification to what are gradually becoming outdated specs. Murray ``````````````````````````````````````````````````````````````````````````````` Murray Altheim, Program Manager Spyglass, Inc., Cambridge, Massachusetts email: <mailto:murray@spyglass.com> http: <http://www.cm.spyglass.com/murray/murray.html> "Give a monkey the tools and he'll eventually build a typewriter."
Received on Monday, 2 December 1996 20:47:43 UTC