Re: p:http-request content-type and encoding from Florent Georges on 2009-02-04 (public-xml-processing-model-comments@w3.org from February 2009)

From: Florent Georges <fgeorges@fgeorges.org>
Date: Wed, 4 Feb 2009 16:13:10 +0100
To: Norman Walsh <ndw@nwalsh.com>
Cc: public-xml-processing-model-comments@w3.org
Message-ID: <ebaca5bf0902040713r52a7be5ax126c5dd39225c7de@mail.gmail.com>

2009/2/4 Norman Walsh wrote:

> Do I do this:

>  <c:multipart ...>
>    <c:body content-type="application/xml" encoding="utf-8">...</c:body>
>    <c:body content-type="application/xml" encoding="iso-8859-1">...</c:body>
>  </c:multipart>

> or do I do this:

>  <c:multipart ...>
>    <c:body content-type="application/xml; charset=utf-8">...</c:body>
>    <c:body content-type="application/xml; charset=iso-8859-1">...</c:body>
>  </c:multipart>

  I think there are several aspects here.  The charset is a
parameter of the content type.  RFC 2616, §3.7.1, p. 27
<http://tools.ietf.org/html/rfc2616#section-3.7.1> says:

    The "charset" parameter is used with some media types to
    define the character set (section 3.4) of the data. When no
    explicit charset parameter is provided by the sender, media
    subtypes of the "text" type are defined to have a default
    charset value of "ISO-8859-1"

  Charsets are defined in RFC 2616, §3.4, p. 21
<http://tools.ietf.org/html/rfc2616#section-3.4>.

  The Content-Encoding header defines the body's content coding.
This is used for example to compress the content (RFC 2616, §3.5,
p. 23 <http://tools.ietf.org/html/rfc2616#section-3.5> for content
coding definition, the header is in RFC 2616, §14.11, p. 118
<http://tools.ietf.org/html/rfc2616#section-14.11>.)  The example
of the RFC is: "Content-Encoding: gzip"

  So if you send a text file in UTF-8, but compress it with gzip,
I think you should have the following headers:

    Content-Type: text/plain; charset=utf-8
    Content-Encoding: gzip

  If I am right, base64 is different yet.  This is defined in RFC
1521, p. 14 <http://tools.ietf.org/html/rfc1521#page-14>:

    encoding := "Content-Transfer-Encoding" ":" mechanism

    mechanism :=     "7bit"  ;  case-insensitive
                   / "quoted-printable"
                   / "base64"
                   / "8bit"
                   / "binary"
                   / x-token

  So taking the previous example, if you send a text file in
UTF-8, compress it with gzip, and transfer-encod it as base64, I
think you should have the following headers:

    Content-Type: text/plain; charset=utf-8
    Content-Encoding: gzip
    Content-Transfer-Encoding: base64

  To make things a bit more complex, you can have several content
codings for a single entity.  You then use several headers
Content-Encoding, in the right order.

  So about your question, I think that if one possible value of
@encoding is base64, we should then not use it for charset info:

    <c:body content-type="application/xml; charset=utf-8">
       ...
    </c:body>

  With the above body, should the processor analyse @content-type,
see the charset info, and use it to serialize the text?  I think
so.  And what with the following:

    <c:header name="Content-Type"
              value="application/xml; charset=utf-8"/>
    <c:body content-type="application/xml">
       ...
    </c:body>

?  I think so too.  Could we assume that @encoding can contain
either content coding or content transfer coding?  I think so.  So
we could have encoding="gzip" and encoding="base64", the processor
deciding which header to generate.

  Eventually, could we change @encoding definition to be able to
contain several encodings?  I think so.  Back again to the
previous example, we could have either:

    <c:header name="Content-Type" value="text/plain; charset=utf-8"/>
    <c:header name="Content-Encoding" value="gzip"/>
    <c:header name="Content-Transfer-Encoding" value="base64"/>
    <c:body content-type="text/plain">
       [ content ]
    </c:body>

or:

    <c:body content-type="text/plain; charset=utf-8"
            encoding="gzip base64">
       [ content ]
    </c:body>

  Does that make sense?

  Regards,

-- 
Florent Georges
http://www.fgeorges.org/

Received on Wednesday, 4 February 2009 15:16:59 UTC