- From: Garret Wilson <garret@globalmentor.com>
- Date: Sun, 04 Nov 2007 17:01:01 -0800
- To: Dan Brickley <danbri@danbri.org>
- CC: Graham Klyne <GK@ninebynine.org>, Tim Berners-Lee <timbl@w3.org>, www-rdf-comments@w3.org
Dan Brickley wrote: > > My understanding is that text/xml is widely considered problematic, > eg. http://annevankesteren.nl/2005/03/text-xml > http://www.w3.org/TR/xhtml-media-types/ quoting from > http://www.rfc-editor.org/rfc/rfc3023.txt Tracking down exactly *why* text/* is considered problematic has been difficult, at least for me, but I think it comes down to 1 ) default interpretation by the browser, and 2 ) allowed/default encoding, which you reference above. From RFC 3023 Section 3: The top-level media type "text" has some restrictions on MIME entities and they are described in [RFC2045] and [RFC2046]. In particular, the UTF-16 family, UCS-4, and UTF-32 are not allowed (except over HTTP[RFC2616], which uses a MIME-like mechanism). Thus, if an XML document or external parsed entity is encoded in such character encoding schemes, it cannot be labeled as text/xml or text/xml-external-parsed-entity (except for HTTP). Others (including your first reference above) have also mentioned that the character set of all text/* entries defaults to US-ASCII if the charset parameter is not supplied. Now, if the quoted paragraph from RFC 3023 is true, then there's no more argument---under no conditions should N3 use a text/* content type. But I have trouble finding this explicitly in RFC 2045 or RFC 2046. RFC 2046 Section 4.1.2. *does* say that the default for charset "for 'text/plain' data" is US-ASCII, but to me it is ambiguous whether this applies to text/* subtypes as well. I suppose that, with the popular understanding that RFC 2046 requires a default character set of US-ASCII if there is no charset parameter, then it's almost as true as if RFC 2046 said so explicitly. But that leads to uncomfortable conclusions: if nothing but unadorned text should use a text/* top-level type, and if all text/* top-level types default to US-ASCII, then I can't think of a single use for the text/* top-level type---not even for plain text, which should probably be application/plaintext. Can't someone just put out another RFC saying that text/* subtypes besides text/plain may specify default encodings other than US-ASCII, or something similar? Are we really stuck for the rest of computing eternity with a specification decision that doesn't even support Norwegian, much less Mandarin? You and Graham are making a good argument (well, I guess I brought them up too on this thread) that old specifications bring gotchas that would prevent us from using a text/* top-level type for N3. But those reasons have some larger ramifications which make me uncomfortable. Garret P.S. Arg---why do simple decisions have to be so difficult? Damn you, US-ASCII! P.P.S. I want a byte to be 32-bits too, while you're at it. ;)
Received on Monday, 5 November 2007 01:02:17 UTC