- From: Florent Georges <fgeorges@fgeorges.org>
- Date: Wed, 4 Feb 2009 16:13:10 +0100
- To: Norman Walsh <ndw@nwalsh.com>
- Cc: public-xml-processing-model-comments@w3.org
2009/2/4 Norman Walsh wrote: > Do I do this: > <c:multipart ...> > <c:body content-type="application/xml" encoding="utf-8">...</c:body> > <c:body content-type="application/xml" encoding="iso-8859-1">...</c:body> > </c:multipart> > or do I do this: > <c:multipart ...> > <c:body content-type="application/xml; charset=utf-8">...</c:body> > <c:body content-type="application/xml; charset=iso-8859-1">...</c:body> > </c:multipart> I think there are several aspects here. The charset is a parameter of the content type. RFC 2616, §3.7.1, p. 27 <http://tools.ietf.org/html/rfc2616#section-3.7.1> says: The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" Charsets are defined in RFC 2616, §3.4, p. 21 <http://tools.ietf.org/html/rfc2616#section-3.4>. The Content-Encoding header defines the body's content coding. This is used for example to compress the content (RFC 2616, §3.5, p. 23 <http://tools.ietf.org/html/rfc2616#section-3.5> for content coding definition, the header is in RFC 2616, §14.11, p. 118 <http://tools.ietf.org/html/rfc2616#section-14.11>.) The example of the RFC is: "Content-Encoding: gzip" So if you send a text file in UTF-8, but compress it with gzip, I think you should have the following headers: Content-Type: text/plain; charset=utf-8 Content-Encoding: gzip If I am right, base64 is different yet. This is defined in RFC 1521, p. 14 <http://tools.ietf.org/html/rfc1521#page-14>: encoding := "Content-Transfer-Encoding" ":" mechanism mechanism := "7bit" ; case-insensitive / "quoted-printable" / "base64" / "8bit" / "binary" / x-token So taking the previous example, if you send a text file in UTF-8, compress it with gzip, and transfer-encod it as base64, I think you should have the following headers: Content-Type: text/plain; charset=utf-8 Content-Encoding: gzip Content-Transfer-Encoding: base64 To make things a bit more complex, you can have several content codings for a single entity. You then use several headers Content-Encoding, in the right order. So about your question, I think that if one possible value of @encoding is base64, we should then not use it for charset info: <c:body content-type="application/xml; charset=utf-8"> ... </c:body> With the above body, should the processor analyse @content-type, see the charset info, and use it to serialize the text? I think so. And what with the following: <c:header name="Content-Type" value="application/xml; charset=utf-8"/> <c:body content-type="application/xml"> ... </c:body> ? I think so too. Could we assume that @encoding can contain either content coding or content transfer coding? I think so. So we could have encoding="gzip" and encoding="base64", the processor deciding which header to generate. Eventually, could we change @encoding definition to be able to contain several encodings? I think so. Back again to the previous example, we could have either: <c:header name="Content-Type" value="text/plain; charset=utf-8"/> <c:header name="Content-Encoding" value="gzip"/> <c:header name="Content-Transfer-Encoding" value="base64"/> <c:body content-type="text/plain"> [ content ] </c:body> or: <c:body content-type="text/plain; charset=utf-8" encoding="gzip base64"> [ content ] </c:body> Does that make sense? Regards, -- Florent Georges http://www.fgeorges.org/
Received on Wednesday, 4 February 2009 15:16:59 UTC