- From: Sandro Hawke <sandro@w3.org>
- Date: Thu, 08 Mar 2012 08:11:13 -0500
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: RDF-WG <public-rdf-wg@w3.org>
On Thu, 2012-03-08 at 10:36 +0000, Andy Seaborne wrote:
> RFC 2046 defines application/* and text/*.
>
> The only default charset rules are for text/* only, not for application/*.
>
> ==== Proposal
>
> We can preserve existing behavior exactly.
+1 This looks to me like it will work fine.
I know I expressed support for using the text/* tree in the meeting, so
that browsers will show the text when people click on links to it, but I
think that may be less important than allowing publishers to publish
UTF-8 without having to figure out how to specify a content-encoding and
remember to do it every time.
-- Sandro
> 1/ We state the current position that N-Triples data is currently served
> as text/plain and the default charset in this case is therefore ASCII.
>
> 2/ We register a new MIME type, application/n-triples, default charset
> UTF-8 (see below for rationale)
>
> 3/ (Optional) We could also register text/n-triples (default charset ascii).
>
> c.f. application/xml and text/xml.
>
> ==== Rationale
>
> == RFC 2046
>
> I went to RFC 2046 which I think defines application/*
>
> [[RFC 2046
> 4.5.3. Other Application Subtypes
>
> It is expected that many other subtypes of "application" will be
> defined in the future. MIME implementations must at a minimum treat
> any unrecognized subtypes as being equivalent to "application/octet-
> stream".
> ]]
>
> that's the nearest I could find to a general statement about
> application/*. There isn't anything about a default charset. If no
> charset is given, it's octets.
>
> If it's octets, the interpretation is up to the content-type
> registration. That can be to name a default or require a charset
> parameter be present. Dafault seems better.
>
> == text/*
>
> Only text/* has the any special rules and the defaulting rule is text/*
> specific.
> [see section 4.1 of RFC 2046]
>
> 1) the default for text/plain is us-ascii
>
> 2) other subtypes must default to us-ascii
>
> 3) Unrecognised types can be treated as text/plain
>
> 4) Types with unrecognised charsets are treated as
> application/octet-stream.
>
> == Implications
>
> This works well for us because ASCII is a subset of UTF-8 so existing
> N-Triples data can be read, as bytes, as both ASCII and UTF-8 without loss.
>
> If there is no charset on application/n-triples, then the data is passed
> to the processor, untouched (binary octets) and whatever rules this WG
> defines apply which go in the MIME type registration.
>
> == Existing data works in all content types
>
> Reading UTF-8 or ASCII for existing N-Triples data will yield the same
> codepoints i.e. set the default to UTF-8 and there is no problem for
> existing N-Triples data, and what is more, new style data is detectable
> because it is outside legal US-ASCII (even better, treating as binary
> octets preserves the data).
>
> == Existing software
>
> That leaves existing software, new data.
>
> But that is expecting text/plain. Adding text/n-triples may help if
> there is existing use of such a content type. (I have seen N-triples
> serves up as all sorts of things.)
>
> We handle this by noting in the spec that text/plain is also used for
> compatibility for N-triples and also note it is required to default to
> ASCII.
>
> Existing software that is not MIME-type sensitive is at the mercy of
> what's fed in regardless of what the working group decides, including
> Turtle.
>
> Existing software fed with existing data for any content-type/charset
> combination describes here will work and be correct.
>
> == Test cases, please!
>
> Let's move to working on specific examples. If something looks broken,
> please provide specific test cases. It's going to easier to make
> progress if we deal with concrete examples now.
>
> Andy
>
>
Received on Thursday, 8 March 2012 13:11:34 UTC