PROV-N media type registration charset redux from Graham Klyne on 2012-07-10 (public-prov-wg@w3.org from July 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Tue, 10 Jul 2012 09:17:55 +0100
To: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <4FFBE533.9070500@zoo.ox.ac.uk>

There's a new IETF RFC that has bearing on this: 
http://www.rfc-editor.org/rfc/rfc6657.txt

The rules are changing...

[[
    Section 4.1.2 of [RFC2046] says:

       The default character set, which must be assumed in the absence of
       a charset parameter, is US-ASCII.

    As explained in the Introduction section, this rule is considered
    outdated, so this document replaces it with the following set of
    rules:

    Each subtype of the "text" media type that uses the "charset"
    parameter can define its own default value for the "charset"
    parameter, including the absence of any default.

    In order to improve interoperability with deployed agents, "text/*"
    media type registrations SHOULD either

    a.  specify that the "charset" parameter is not used for the defined
        subtype, because the charset information is transported inside
        the payload (such as in "text/xml"), or

    b.  require explicit unconditional inclusion of the "charset"
        parameter, eliminating the need for a default value.
]]
-- http://www.rfc-editor.org/rfc/rfc6657.txt, section 3

I'm finding the document is a little confusing to interpret, as it also says 
that each media type defines its own default.

What I'm thinking we want to say is that the encoding is always UTF-8, and the 
charset parameter is never used.

Alternatively, we can say the charset parameter is always present, and MUST be 
UTF-8.

I've emailed the IETF-APPS group to request clarification.

For the time being, I think the safe option is to say the charset MUST be 
present and MUST have the value "UTF-8".  The text about US-ASCII can be dropped.

#g
--

Received on Tuesday, 10 July 2012 08:42:28 UTC