- From: Jamie Lokier <jamie@shareable.org>
- Date: Fri, 25 Jun 2004 18:18:07 +0100
- To: Alex Rousskov <rousskov@measurement-factory.com>
- Cc: Asbjørn Ulsberg <asbjorn@tigerstaden.no>, ietf-http-wg@w3.org, Atom Syntax <atom-syntax@imc.org>
Alex Rousskov wrote: > > Absolutely. That makes the most sense, especially since that's how > > many (most?) XML libraries already behave. > > Have you read the arguments for ascii charset default in RFC 3023? > If those arguments are not correct, then somebody should consider > writing an RFC that obsoletes RFC 3023. If those arguments are > correct, then violating the RFC may not be such a good idea even if it > seems to solve the Atom problem. RFC 3023: Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii". (Note: There is an inconsistency between this specification and HTTP/1.1, which uses ISO-8859-1[ISO8859] as the default for a historical reason. Since XML is a new format, a new default should be chosen for better I18N. US-ASCII was chosen, since it is the intersection of UTF-8 and ISO-8859-1 and since it is already used by MIME.) Note the inconsistency mentioned. There is a second inconsistency, which I don't see mentioned in RFC 3023. From the HTML 4.01 standard: The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter. The inconsistency between HTML 4.01 and RFC 3023 is that "text/html" does _not_ force the document to be interpreted in a particular charset -- it leaves the client to decide based on the content. In this regard, HTML 4.01 overrides RFC 2616. Whereas, RFC 3023 would like that "text/xml" _does_ the documented to be interpreted in us-ascii. This makes complete sense for MIME, where it really does have to be text. I don't have a position either way. I suggest that if RFC 3023 should be obsoleted, it is should be only if there's an abundance of clients which look at the <?xml...?> declaration given "text/xml" -- in effect, giving up a requirement of RFC 3032, in the same way that HTML 4.01 says to give up a requirement from RFC 2616. I don't know if there is an abundance of such clients. -- Jamie
Received on Friday, 25 June 2004 13:18:25 UTC