Re: Definitions: RFC 2046 and application/*

+1.  Nicely done, Andy.

Zhe and Souri, it has been suggested that your currently fielded implementations will work with Andy's proposal.  Can you please explicitly confirm or deny that?  Thanks.

Regards,
Dave




On Mar 8, 2012, at 05:36, Andy Seaborne wrote:

> RFC 2046 defines application/* and text/*.
> 
> The only default charset rules are for text/* only, not for application/*.
> 
> ==== Proposal
> 
> We can preserve existing behavior exactly.
> 
> 1/ We state the current position that N-Triples data is currently served as text/plain and the default charset in this case is therefore ASCII.
> 
> 2/ We register a new MIME type, application/n-triples, default charset UTF-8 (see below for rationale)
> 
> 3/ (Optional) We could also register text/n-triples (default charset ascii).
> 
> c.f. application/xml and text/xml.
> 
> ==== Rationale
> 
> == RFC 2046
> 
> I went to RFC 2046 which I think defines application/*
> 
> [[RFC 2046
> 4.5.3.  Other Application Subtypes
> 
>   It is expected that many other subtypes of "application" will be
>   defined in the future.  MIME implementations must at a minimum treat
>   any unrecognized subtypes as being equivalent to "application/octet-
>   stream".
> ]]
> 
> that's the nearest I could find to a general statement about application/*.  There isn't anything about a default charset.  If no charset is given, it's octets.
> 
> If it's octets, the interpretation is up to the content-type registration.  That can be to name a default or require a charset parameter be present.  Dafault seems better.
> 
> == text/*
> 
> Only text/* has the any special rules and the defaulting rule is text/* specific.
> [see section 4.1 of RFC 2046]
> 
> 1) the default for text/plain is us-ascii
> 
> 2) other subtypes must default to us-ascii
> 
> 3) Unrecognised types can be treated as text/plain
> 
> 4) Types with unrecognised charsets are treated as
>   application/octet-stream.
> 
> == Implications
> 
> This works well for us because ASCII is a subset of UTF-8 so existing N-Triples data can be read, as bytes, as both ASCII and UTF-8 without loss.
> 
> If there is no charset on application/n-triples, then the data is passed to the processor, untouched (binary octets) and whatever rules this WG defines apply which go in the MIME type registration.
> 
> == Existing data works in all content types
> 
> Reading UTF-8 or ASCII for existing N-Triples data will yield the same codepoints i.e. set the default to UTF-8 and there is no problem for existing N-Triples data, and what is more, new style data is detectable because it is outside legal US-ASCII (even better, treating as binary octets preserves the data).
> 
> == Existing software
> 
> That leaves existing software, new data.
> 
> But that is expecting text/plain.  Adding text/n-triples may help if there is existing use of such a content type.  (I have seen N-triples serves up as all sorts of things.)
> 
> We handle this by noting in the spec that text/plain is also used for compatibility for N-triples and also note it is required to default to ASCII.
> 
> Existing software that is not MIME-type sensitive is at the mercy of what's fed in regardless of what the working group decides, including Turtle.
> 
> Existing software fed with existing data for any content-type/charset combination describes here will work and be correct.
> 
> == Test cases, please!
> 
> Let's move to working on specific examples. If something looks broken, please provide specific test cases.  It's going to easier to make progress if we deal with concrete examples now.
> 
> 	Andy
> 

Received on Thursday, 8 March 2012 13:27:21 UTC