- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 08 Mar 2012 10:36:52 +0000
- To: RDF-WG <public-rdf-wg@w3.org>
RFC 2046 defines application/* and text/*.
The only default charset rules are for text/* only, not for application/*.
==== Proposal
We can preserve existing behavior exactly.
1/ We state the current position that N-Triples data is currently served
as text/plain and the default charset in this case is therefore ASCII.
2/ We register a new MIME type, application/n-triples, default charset
UTF-8 (see below for rationale)
3/ (Optional) We could also register text/n-triples (default charset ascii).
c.f. application/xml and text/xml.
==== Rationale
== RFC 2046
I went to RFC 2046 which I think defines application/*
[[RFC 2046
4.5.3. Other Application Subtypes
It is expected that many other subtypes of "application" will be
defined in the future. MIME implementations must at a minimum treat
any unrecognized subtypes as being equivalent to "application/octet-
stream".
]]
that's the nearest I could find to a general statement about
application/*. There isn't anything about a default charset. If no
charset is given, it's octets.
If it's octets, the interpretation is up to the content-type
registration. That can be to name a default or require a charset
parameter be present. Dafault seems better.
== text/*
Only text/* has the any special rules and the defaulting rule is text/*
specific.
[see section 4.1 of RFC 2046]
1) the default for text/plain is us-ascii
2) other subtypes must default to us-ascii
3) Unrecognised types can be treated as text/plain
4) Types with unrecognised charsets are treated as
application/octet-stream.
== Implications
This works well for us because ASCII is a subset of UTF-8 so existing
N-Triples data can be read, as bytes, as both ASCII and UTF-8 without loss.
If there is no charset on application/n-triples, then the data is passed
to the processor, untouched (binary octets) and whatever rules this WG
defines apply which go in the MIME type registration.
== Existing data works in all content types
Reading UTF-8 or ASCII for existing N-Triples data will yield the same
codepoints i.e. set the default to UTF-8 and there is no problem for
existing N-Triples data, and what is more, new style data is detectable
because it is outside legal US-ASCII (even better, treating as binary
octets preserves the data).
== Existing software
That leaves existing software, new data.
But that is expecting text/plain. Adding text/n-triples may help if
there is existing use of such a content type. (I have seen N-triples
serves up as all sorts of things.)
We handle this by noting in the spec that text/plain is also used for
compatibility for N-triples and also note it is required to default to
ASCII.
Existing software that is not MIME-type sensitive is at the mercy of
what's fed in regardless of what the working group decides, including
Turtle.
Existing software fed with existing data for any content-type/charset
combination describes here will work and be correct.
== Test cases, please!
Let's move to working on specific examples. If something looks broken,
please provide specific test cases. It's going to easier to make
progress if we deal with concrete examples now.
Andy
Received on Thursday, 8 March 2012 10:37:23 UTC