- From: Garret Wilson <garret@globalmentor.com>
- Date: Mon, 17 Dec 2007 08:15:33 -0800
- To: "Sean B. Palmer" <sean@miscoranda.com>
- CC: Eric Prud'hommeaux <eric@w3.org>, ietf-types@iana.org, Tim Berners-Lee <timbl@w3.org>, "Daniel W. Connolly" <connolly@w3.org>, Dave Beckett <dave@dajobe.org>, Lee Feigenbaum <lee@thefigtrees.net>, Graham Klyne <GK@ninebynine.org>, Dan Brickley <danbri@danbri.org>, www-archive@w3.org
Sean B. Palmer wrote: > On Dec 17, 2007 3:22 PM, Garret Wilson <garret@globalmentor.com> wrote: > > >> There exists serious concern regarding the use of a text top-level >> type for N3. See the recent discussion on www-rdf-comments. >> > > Eric and I discussed that in some detail prior to and subsequent to > the start of this thread. One thing that I don't understand is what > you said here: > > [[[ > I suppose that, with the popular understanding that RFC 2046 requires a > default character set of US-ASCII if there is no charset parameter, then > it's almost as true as if RFC 2046 said so explicitly. > ]]] - http://lists.w3.org/Archives/Public/www-rdf-comments/2007OctDec/0017 > > It seems very clear to me that RFC 2046 states explicitly that > US-ASCII is required if there is no charset parameter. Here are the > relevant quotes: > > The default character set, which must be assumed in the absence > of a charset parameter, is US-ASCII. > > ... > > Note that the character set used, if anything other than US- ASCII, > must always be explicitly specified in the Content-Type field. > > The way I read that, that doesn't leave any room for a text/anything > specification setting its own default. > In the excerpt you presented, "The default character set...", it must be asked, "the default character set of what?" My literal reading of RFC 2046 (which may not be correct) led me to believe that this only applied to text/plain. Let's look at the whole section, including parts you left out: A critical parameter that may be specified in the Content-Type field for "text/plain" data is the character set. This is specified with a "charset" parameter, as in: Content-type: text/plain; charset=iso-8859-1 Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII. The first sentence led me to believe that we are talking about "text/plain" and "text/plain" only. Therefore, "The default character set" to me indicated "The default character set of text/plain". Immediately following this is the paragraph: The specification for any future subtypes of "text" must specify whether or not they will also utilize a "charset" parameter, and may possibly restrict its values as well. For other subtypes of "text" than "text/plain", the semantics of the "charset" parameter should be defined to be identical to those specified here for "text/plain", i.e., the body consists entirely of characters in the given charset. In particular, definers of future "text" subtypes should pay close attention to the implications of multioctet character sets for their subtype definitions. So my interpretation was that 1 ) the default character set of text/plain is US-ASCII, and 2 ) other "text" subtypes may define differently whether and how they utilize a charset parameter. Again, this may be an incorrect reading; as I mentioned, even if it is literally correct, if the rest of the community understands it to mean that charset defaults to "text/plain" for all text/* types, the literal meaning, which is not unambiguous, is probably moot. > As for the CRLF requirement, that CRLF and *only* CRLF be used for > line breaks, Dan Brickley commented in response to that that text/xml > was widely regarded troublesome; but it's not clear from his citations > that CRLF has anything to do with the troublesome nature, only charset > defaulting. > I don't see how this could cause many problems in practice; I was just pointing out that technically XML does not follow RFC 2046 requirements because it has its own rules about CR, LF, and CRLF. > It seems that most of the problem, as you mentioned in the > www-rdf-comments thread, is that the text subtree is simply broken. > RFC 2046 just wasn't written to deal with the Unicode world. Check out > the following, for example: > > A SINGLE character set that can be used > universally for representing all of the world's languages in Internet > mail would be preferrable. Unfortunately, existing practice in > several communities seems to point to the continued use of multiple > character sets in the near future. A small number of standard > character sets are, therefore, defined for Internet use in this > document. > > And it defines US-ASCII and ISO-8859-X. It's not RFC 2046's fault that > it wasn't prescient, but it's *out-of-date* now and perhaps ought to > be obsoleted so that text/* can be used as intended rather than as > we're currently forced? > That would be what I would prefer. The other option is *not* to obsolete RFC 2046 text types, which means that no one uses RFC 2046 text types, making RFC 2046 text types de facto obsolete. In fact, I could easily be persuaded just to ignore the US-ASCII and CRLF parts if everyone else were to do the same. But surely someone could take the time to write up another RFC making this explicit. > But of course there is the question of what MIME implementations will > do and what problems, possibly serious, it would cause to, for > example, make utf-8 the new text/* default. It would need a lot of > discussion and a new RFC. > The problem with the computer standards process is that it has gotten so bureaucratic and slow that it's hard for anything significant to happen to solve problems in any short time frame. I applaud anyone that would push for a new RFC to update RFC 2046--that's the Right Thing To Do here. Unfortunately, I wouldn't place my bets on this happening anytime soon. (It might beat XHTML 2.0 out the door, though.) > Note that TimBL has never, as far as I know, suggested disregarding > the charset defaulting requirement, just the CRLF requirement which he > mightn't even be aware of. And as it seems that the charset defaulting > is the thing that most people are anxious about, I'd be happy for > text/rdf+n3; charset=utf-8 or text/n3; charset=utf-8 to go forwards, > even ignoring the fact that it disregards the CRLF requirement. > I don't like that option; it's too bulky and feels like a hack. I'd prefer that either application/* were used, or the RFC 2046 default character set were ignored. Either would be a better solution. Garret
Received on Monday, 17 December 2007 16:17:51 UTC