Re: Definitions: RFC 2046 and application/*

On Thu, 2012-03-08 at 10:36 +0000, Andy Seaborne wrote:
> RFC 2046 defines application/* and text/*.
> 
> The only default charset rules are for text/* only, not for application/*.
> 
> ==== Proposal
> 
> We can preserve existing behavior exactly.

+1     This looks to me like it will work fine.

I know I expressed support for using the text/* tree in the meeting, so
that browsers will show the text when people click on links to it, but I
think that may be less important than allowing publishers to publish
UTF-8 without having to figure out how to specify a content-encoding and
remember to do it every time.

     -- Sandro

> 1/ We state the current position that N-Triples data is currently served 
> as text/plain and the default charset in this case is therefore ASCII.
> 
> 2/ We register a new MIME type, application/n-triples, default charset 
> UTF-8 (see below for rationale)
> 
> 3/ (Optional) We could also register text/n-triples (default charset ascii).
> 
> c.f. application/xml and text/xml.
> 
> ==== Rationale
> 
> == RFC 2046
> 
> I went to RFC 2046 which I think defines application/*
> 
> [[RFC 2046
> 4.5.3.  Other Application Subtypes
> 
>     It is expected that many other subtypes of "application" will be
>     defined in the future.  MIME implementations must at a minimum treat
>     any unrecognized subtypes as being equivalent to "application/octet-
>     stream".
> ]]
> 
> that's the nearest I could find to a general statement about 
> application/*.  There isn't anything about a default charset.  If no 
> charset is given, it's octets.
> 
> If it's octets, the interpretation is up to the content-type 
> registration.  That can be to name a default or require a charset 
> parameter be present.  Dafault seems better.
> 
> == text/*
> 
> Only text/* has the any special rules and the defaulting rule is text/* 
> specific.
> [see section 4.1 of RFC 2046]
> 
> 1) the default for text/plain is us-ascii
> 
> 2) other subtypes must default to us-ascii
> 
> 3) Unrecognised types can be treated as text/plain
> 
> 4) Types with unrecognised charsets are treated as
>     application/octet-stream.
> 
> == Implications
> 
> This works well for us because ASCII is a subset of UTF-8 so existing 
> N-Triples data can be read, as bytes, as both ASCII and UTF-8 without loss.
> 
> If there is no charset on application/n-triples, then the data is passed 
> to the processor, untouched (binary octets) and whatever rules this WG 
> defines apply which go in the MIME type registration.
> 
> == Existing data works in all content types
> 
> Reading UTF-8 or ASCII for existing N-Triples data will yield the same 
> codepoints i.e. set the default to UTF-8 and there is no problem for 
> existing N-Triples data, and what is more, new style data is detectable 
> because it is outside legal US-ASCII (even better, treating as binary 
> octets preserves the data).
> 
> == Existing software
> 
> That leaves existing software, new data.
> 
> But that is expecting text/plain.  Adding text/n-triples may help if 
> there is existing use of such a content type.  (I have seen N-triples 
> serves up as all sorts of things.)
> 
> We handle this by noting in the spec that text/plain is also used for 
> compatibility for N-triples and also note it is required to default to 
> ASCII.
> 
> Existing software that is not MIME-type sensitive is at the mercy of 
> what's fed in regardless of what the working group decides, including 
> Turtle.
> 
> Existing software fed with existing data for any content-type/charset 
> combination describes here will work and be correct.
> 
> == Test cases, please!
> 
> Let's move to working on specific examples. If something looks broken, 
> please provide specific test cases.  It's going to easier to make 
> progress if we deal with concrete examples now.
> 
>  Andy
> 
> 

Received on Thursday, 8 March 2012 13:11:34 UTC