- From: Andreas Maier <MAIERA@de.ibm.com>
- Date: Wed, 23 May 2007 13:41:09 +0200
- To: ietf-http-wg@w3.org
- Cc: Brian Carpenter <brian.e.carpenter@gmail.com>, Larry Masinter <masinter@adobe.com>, David Singer <singer@almaden.ibm.com>
... resending after getting subscribed to the list ... Hi, In our group, we encountered a flaw in RFC2616 as part of investigating an interoperability problem between a client and a server component that use the CIM-XML over HTTP protocol. That protocol is used widely in the industry for systems management and is owned by the DMTF standards org (www.dmtf.org). I'd like to bring the flaw in RFC2616 to your attention, with the goal of folding this into a possibly upcoming update on that RFC (see Larry's mail below). I am not up to date as to where to report such things, so you need to help me a little to get this directed the right way (e.g. towards the activity Larry was mentioning). Here is a description of the flaw: RFC2616 defines the "charset" parameter of the "Content-Type" header partly using BNF, and partly in text. When following the BNF dependency tree of the "Content-Type" production (defined in section 14.17), it uses the "media-type" production (defined in section 3.7), which uses the "parameter" production (defined in section 3.6) which allows both an unquoted "token" and a "quoted-string". So far, this allows both quoted and unquoted forms for the value of the "charset" parameter of "Content-Type", as in the following two examples (which are from how the CIM-XML over HTTP protocol uses Content-Type): Content-type: application/xml; charset="utf-8" Content-type: application/xml; charset=utf-8 However, there is also the following text in section 3.4: --- begin of text --- HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character Set registry [19]. charset = token Although HTTP allows an arbitrary token to be used as a charset value, any token that has a predefined value within the IANA Character Set registry [19] MUST represent the character set defined by that registry. Applications SHOULD limit their use of character sets to those defined by the IANA registry --- end of text --- This text adds to the BNF defined syntax rules by defining a recommendation to use the IANA defined character sets in an unquoted form. The interoperability problem we encountered was caused by a silent agreement in the CIM-XML community that the quoted form is to be used, while the one CIM server that ran into the interoperability issue obviously has read RFC2616 better than the rest of the CIM-XML community and required the form without quotes. We plan to fix that in our CIM-XML spec by recommending to use the unquoted form on the sending side, and to support both forms on the receiving side. Back to the flaw in RFC2616. The flaw is that the BNF definition of the "Content-Type" production does not utiilize the "charset" production defined in section 3.4, and therefore an occasional reader of RFC2616 who follows the BNF dependencies, does not necessarily notice section 3.4 and hence arrives at the conclusion that the quoted and unquoted form are both equally ok. Which is what happened to me ;-) I suggest to fix this by utilizing the "charset" production somewhere in the "Content-Type" production. Maybe at the level of "media-type". In addition, an explicit reference to section 3.4 could be added to the description of Content-Type in section 14.17. I believe that specs like RFC2616 are not always read top to bottom in one flow, but are often used as a reference to answer particular questions, and this change would improve the capability of RFC2616 to allow for that. Andy Andreas Maier IBM Senior Technical Staff Member, Systems Management Architecture & Design IBM Development Laboratory Boeblingen, Germany maiera@de.ibm.com, +49-7031-16-3654 ________________________________________________________________________________________________ IBM Deutschland Entwicklung GmbH; Geschaeftsfuehrung: Herbert Kircher; Vorsitzender des Aufsichtsrats: Martin Jetter, Sitz der Gesellschaft: Boeblingen, Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Andreas Maier/Germany/IBM on 05/23/2007 11:29 ----- Brian Carpenter/Switzer land/IBM@IBMCH To Andreas Maier/Germany/IBM@IBMDE 05/22/2007 17:22 cc Subject Recondite HTTP question Andreas, I think you're correct, and the message below answers "is there any place to report this to ?" Regards, - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Brian E Carpenter Distinguished Engineer, Internet Standards & Technology, IBM STG Based in Switzerland, mobile phone +41 79 302 3262 <bcar@ch.ibm.com> for IBM business <brian.e.carpenter@gmail.com> for IETF business ----- Forwarded by Brian Carpenter/Switzerland/IBM on 2007-05-22 17:20 ----- -------- Original Message -------- Subject: RE: Recondite HTTP question Date: Tue, 22 May 2007 06:18:12 -0700 From: Larry Masinter <masinter@adobe.com> To: Brian E Carpenter <brian.e.carpenter@gmail.com> There's a live effort to update the http spec and dea with issues like this. How about bringing this up ietf-http-wg@w3.org? -----Original Message----- From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com] Sent: Tuesday, May 22, 2007 12:14 AM Pacific Standard Time To: Larry Masinter Subject: Recondite HTTP question Larry, Question from a colleague: RFC 2616 seems to allow for both of these: Content-type=application/xml; charset="utf-8" Content-type=application/xml; charset=utf-8 In your view, are both valid? A narrow interpretation suggests that only the second one is formally bound to the IANA charset registry. Thanks Brian -- NEW: Preferred email for non-IBM matters: brian.e.carpenter@gmail.com
Received on Wednesday, 23 May 2007 11:41:20 UTC