- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 08 Mar 2012 08:11:13 -0500
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: RDF-WG <public-rdf-wg@w3.org>
On Thu, 2012-03-08 at 10:36 +0000, Andy Seaborne wrote: > RFC 2046 defines application/* and text/*. > > The only default charset rules are for text/* only, not for application/*. > > ==== Proposal > > We can preserve existing behavior exactly. +1 This looks to me like it will work fine. I know I expressed support for using the text/* tree in the meeting, so that browsers will show the text when people click on links to it, but I think that may be less important than allowing publishers to publish UTF-8 without having to figure out how to specify a content-encoding and remember to do it every time. -- Sandro > 1/ We state the current position that N-Triples data is currently served > as text/plain and the default charset in this case is therefore ASCII. > > 2/ We register a new MIME type, application/n-triples, default charset > UTF-8 (see below for rationale) > > 3/ (Optional) We could also register text/n-triples (default charset ascii). > > c.f. application/xml and text/xml. > > ==== Rationale > > == RFC 2046 > > I went to RFC 2046 which I think defines application/* > > [[RFC 2046 > 4.5.3. Other Application Subtypes > > It is expected that many other subtypes of "application" will be > defined in the future. MIME implementations must at a minimum treat > any unrecognized subtypes as being equivalent to "application/octet- > stream". > ]] > > that's the nearest I could find to a general statement about > application/*. There isn't anything about a default charset. If no > charset is given, it's octets. > > If it's octets, the interpretation is up to the content-type > registration. That can be to name a default or require a charset > parameter be present. Dafault seems better. > > == text/* > > Only text/* has the any special rules and the defaulting rule is text/* > specific. > [see section 4.1 of RFC 2046] > > 1) the default for text/plain is us-ascii > > 2) other subtypes must default to us-ascii > > 3) Unrecognised types can be treated as text/plain > > 4) Types with unrecognised charsets are treated as > application/octet-stream. > > == Implications > > This works well for us because ASCII is a subset of UTF-8 so existing > N-Triples data can be read, as bytes, as both ASCII and UTF-8 without loss. > > If there is no charset on application/n-triples, then the data is passed > to the processor, untouched (binary octets) and whatever rules this WG > defines apply which go in the MIME type registration. > > == Existing data works in all content types > > Reading UTF-8 or ASCII for existing N-Triples data will yield the same > codepoints i.e. set the default to UTF-8 and there is no problem for > existing N-Triples data, and what is more, new style data is detectable > because it is outside legal US-ASCII (even better, treating as binary > octets preserves the data). > > == Existing software > > That leaves existing software, new data. > > But that is expecting text/plain. Adding text/n-triples may help if > there is existing use of such a content type. (I have seen N-triples > serves up as all sorts of things.) > > We handle this by noting in the spec that text/plain is also used for > compatibility for N-triples and also note it is required to default to > ASCII. > > Existing software that is not MIME-type sensitive is at the mercy of > what's fed in regardless of what the working group decides, including > Turtle. > > Existing software fed with existing data for any content-type/charset > combination describes here will work and be correct. > > == Test cases, please! > > Let's move to working on specific examples. If something looks broken, > please provide specific test cases. It's going to easier to make > progress if we deal with concrete examples now. > > Andy > >
Received on Thursday, 8 March 2012 13:11:34 UTC