Re: N-Triples MIME type should not be text/plain -- comment on RDF Test Cases. from Garret Wilson on 2007-11-05 (www-rdf-comments@w3.org from October to December 2007)

From: Garret Wilson <garret@globalmentor.com>
Date: Sun, 04 Nov 2007 17:01:01 -0800
To: Dan Brickley <danbri@danbri.org>
CC: Graham Klyne <GK@ninebynine.org>, Tim Berners-Lee <timbl@w3.org>, www-rdf-comments@w3.org
Message-ID: <472E6B4D.2010808@globalmentor.com>

Dan Brickley wrote:
>
> My understanding is that text/xml is widely considered problematic,
> eg. http://annevankesteren.nl/2005/03/text-xml
> http://www.w3.org/TR/xhtml-media-types/ quoting from
> http://www.rfc-editor.org/rfc/rfc3023.txt

Tracking down exactly *why* text/* is considered problematic has been 
difficult, at least for me, but I think it comes down to 1 ) default 
interpretation by the browser, and 2 ) allowed/default encoding, which 
you reference above.

 From RFC 3023 Section 3:

   The top-level media type "text" has some restrictions on MIME
   entities and they are described in [RFC2045] and [RFC2046].  In
   particular, the UTF-16 family, UCS-4, and UTF-32 are not allowed
   (except over HTTP[RFC2616], which uses a MIME-like mechanism).  Thus,
   if an XML document or external parsed entity is encoded in such
   character encoding schemes, it cannot be labeled as text/xml or
   text/xml-external-parsed-entity (except for HTTP).

Others (including your first reference above) have also mentioned that 
the character set of all text/* entries defaults to US-ASCII if the 
charset parameter is not supplied.

Now, if the quoted paragraph from RFC 3023 is true, then there's no more 
argument---under no conditions should N3 use a text/* content type. But 
I have trouble finding this explicitly in RFC 2045 or RFC 2046. RFC 2046 
Section 4.1.2. *does* say that the default for charset "for 'text/plain' 
data" is US-ASCII, but to me it is ambiguous whether this applies to 
text/* subtypes as well.

I suppose that, with the popular understanding that RFC 2046 requires a 
default character set of US-ASCII if there is no charset parameter, then 
it's almost as true as if RFC 2046 said so explicitly. But that leads to 
uncomfortable conclusions: if nothing but unadorned text should use a 
text/* top-level type, and if all text/* top-level types default to 
US-ASCII, then I can't think of a single use for the text/* top-level 
type---not even for plain text, which should probably be 
application/plaintext.

Can't someone just put out another RFC saying that text/* subtypes 
besides text/plain may specify default encodings other than US-ASCII, or 
something similar? Are we really stuck for the rest of computing 
eternity with a specification decision that doesn't even support 
Norwegian, much less Mandarin?

You and Graham are making a good argument (well, I guess I brought them 
up too on this thread) that old specifications bring gotchas that would 
prevent us from using a text/* top-level type for N3. But those reasons 
have some larger ramifications which make me uncomfortable.

Garret

P.S. Arg---why do simple decisions have to be so difficult? Damn you, 
US-ASCII!

P.P.S. I want a byte to be 32-bits too, while you're at it. ;)

Received on Monday, 5 November 2007 01:02:17 UTC