Re: Media types for RDF languages N3 and Turtle from Sean B. Palmer on 2007-12-17 (www-archive@w3.org from December 2007)

From: Sean B. Palmer <sean@miscoranda.com>
Date: Mon, 17 Dec 2007 15:51:29 +0000
To: "Garret Wilson" <garret@globalmentor.com>
Cc: "Eric Prud'hommeaux" <eric@w3.org>, ietf-types@iana.org, "Tim Berners-Lee" <timbl@w3.org>, "Daniel W. Connolly" <connolly@w3.org>, "Dave Beckett" <dave@dajobe.org>, "Lee Feigenbaum" <lee@thefigtrees.net>, "Graham Klyne" <GK@ninebynine.org>, "Dan Brickley" <danbri@danbri.org>, www-archive@w3.org
Message-ID: <b6bb4d890712170751w630a3bd0x42c9a4c72e5ca4b1@mail.gmail.com>

On Dec 17, 2007 3:22 PM, Garret Wilson <garret@globalmentor.com> wrote:

> There exists serious concern regarding the use of a text top-level
> type for N3. See the recent discussion on www-rdf-comments.

Eric and I discussed that in some detail prior to and subsequent to
the start of this thread. One thing that I don't understand is what
you said here:

[[[
I suppose that, with the popular understanding that RFC 2046 requires a
default character set of US-ASCII if there is no charset parameter, then
it's almost as true as if RFC 2046 said so explicitly.
]]] - http://lists.w3.org/Archives/Public/www-rdf-comments/2007OctDec/0017

It seems very clear to me that RFC 2046 states explicitly that
US-ASCII is required if there is no charset parameter. Here are the
relevant quotes:

   The default character set, which must be assumed in the absence
   of a charset parameter, is US-ASCII.

   ...

   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.

The way I read that, that doesn't leave any room for a text/anything
specification setting its own default.

As for the CRLF requirement, that CRLF and *only* CRLF be used for
line breaks, Dan Brickley commented in response to that that text/xml
was widely regarded troublesome; but it's not clear from his citations
that CRLF has anything to do with the troublesome nature, only charset
defaulting.

It seems that most of the problem, as you mentioned in the
www-rdf-comments thread, is that the text subtree is simply broken.
RFC 2046 just wasn't written to deal with the Unicode world. Check out
the following, for example:

   A SINGLE character set that can be used
   universally for representing all of the world's languages in Internet
   mail would be preferrable.  Unfortunately, existing practice in
   several communities seems to point to the continued use of multiple
   character sets in the near future.  A small number of standard
   character sets are, therefore, defined for Internet use in this
   document.

And it defines US-ASCII and ISO-8859-X. It's not RFC 2046's fault that
it wasn't prescient, but it's *out-of-date* now and perhaps ought to
be obsoleted so that text/* can be used as intended rather than as
we're currently forced?

But of course there is the question of what MIME implementations will
do and what problems, possibly serious, it would cause to, for
example, make utf-8 the new text/* default. It would need a lot of
discussion and a new RFC.

Note that TimBL has never, as far as I know, suggested disregarding
the charset defaulting requirement, just the CRLF requirement which he
mightn't even be aware of. And as it seems that the charset defaulting
is the thing that most people are anxious about, I'd be happy for
text/rdf+n3; charset=utf-8 or text/n3; charset=utf-8 to go forwards,
even ignoring the fact that it disregards the CRLF requirement.

Thanks,

-- 
Sean B. Palmer, http://inamidst.com/sbp/

Received on Monday, 17 December 2007 15:51:40 UTC