Re: charset / Content-Type BNF flaw in RFC2616

http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i67


On 23/05/2007, at 9:41 PM, Andreas Maier wrote:

>
>
> ... resending after getting subscribed to the list ...
>
> Hi,
> In our group, we encountered a flaw in RFC2616 as part of  
> investigating an
> interoperability problem between a client and a server component  
> that use
> the CIM-XML over HTTP protocol. That protocol is used widely in the
> industry for systems management and is owned by the DMTF standards org
> (www.dmtf.org).
>
> I'd like to bring the flaw in RFC2616 to your attention, with the  
> goal of
> folding this into a possibly upcoming update on that RFC (see  
> Larry's mail
> below). I am not up to date as to where to report such things, so  
> you need
> to help me a little to get this directed the right way (e.g.  
> towards the
> activity Larry was mentioning).
>
> Here is a description of the flaw:
>
> RFC2616 defines the "charset" parameter of the "Content-Type"  
> header partly
> using BNF, and partly in text. When following the BNF dependency  
> tree of
> the "Content-Type" production (defined in section 14.17), it uses the
> "media-type" production (defined in section 3.7), which uses the
> "parameter" production (defined in section 3.6) which allows both an
> unquoted "token" and a "quoted-string". So far, this allows both  
> quoted and
> unquoted forms for the value of the "charset" parameter of "Content- 
> Type",
> as in the following two examples (which are from how the CIM-XML  
> over HTTP
> protocol uses Content-Type):
>
>   Content-type: application/xml; charset="utf-8"
>   Content-type: application/xml; charset=utf-8
>
> However, there is also the following text in section 3.4:
>
>    --- begin of text ---
>    HTTP character sets are identified by case-insensitive tokens. The
>    complete set of tokens is defined by the IANA Character Set  
> registry
>    [19].
>
>        charset = token
>
>    Although HTTP allows an arbitrary token to be used as a charset
>    value, any token that has a predefined value within the IANA
>    Character Set registry [19] MUST represent the character set  
> defined
>    by that registry. Applications SHOULD limit their use of character
>    sets to those defined by the IANA registry
>    --- end of text ---
>
> This text adds to the BNF defined syntax rules by defining a  
> recommendation
> to use the IANA defined character sets in an unquoted form.
>
> The interoperability problem we encountered was caused by a silent
> agreement in the CIM-XML community that the quoted form is to be used,
> while the one CIM server that ran into the interoperability issue  
> obviously
> has read RFC2616 better than the rest of the CIM-XML community and  
> required
> the form without quotes. We plan to fix that in our CIM-XML spec by
> recommending to use the unquoted form on the sending side, and to  
> support
> both forms on the receiving side.
>
> Back to the flaw in RFC2616. The flaw is that the BNF definition of  
> the
> "Content-Type" production does not utiilize the "charset" production
> defined in section 3.4, and therefore an occasional reader of  
> RFC2616 who
> follows the BNF dependencies, does not necessarily notice section  
> 3.4 and
> hence arrives at the conclusion that the quoted and unquoted form  
> are both
> equally ok. Which is what happened to me ;-)
>
> I suggest to fix this by utilizing the "charset" production  
> somewhere in
> the "Content-Type" production. Maybe at the level of "media-type". In
> addition, an explicit reference to section 3.4 could be added to the
> description of Content-Type in section 14.17.
>
> I believe that specs like RFC2616 are not always read top to bottom  
> in one
> flow, but are often used as a reference to answer particular  
> questions, and
> this change would improve the capability of RFC2616 to allow for that.
>
>
> Andy
>
> Andreas Maier
> IBM Senior Technical Staff Member, Systems Management Architecture  
> & Design
> IBM Development Laboratory Boeblingen, Germany
> maiera@de.ibm.com, +49-7031-16-3654
> ______________________________________________________________________ 
> __________________________
>
> IBM Deutschland Entwicklung GmbH; Geschaeftsfuehrung: Herbert Kircher;
> Vorsitzender des Aufsichtsrats: Martin
> Jetter, Sitz der Gesellschaft: Boeblingen, Registergericht:  
> Amtsgericht
> Stuttgart, HRB 243294
> ----- Forwarded by Andreas Maier/Germany/IBM on 05/23/2007 11:29 -----
>
>              Brian
>              Carpenter/Switzer
>              land/ 
> IBM@IBMCH                                             To
>                                        Andreas Maier/Germany/IBM@IBMDE
>              05/22/2007  
> 17:22                                           cc
>
>                                                                     
> Subject
>                                        Recondite HTTP question
>
>
>
>
>
>
>
>
>
> Andreas,
>
> I think you're correct, and the message below answers "is there any  
> place
> to report this to ?"
>
> Regards,
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Brian E Carpenter
> Distinguished Engineer, Internet Standards & Technology, IBM STG
> Based in Switzerland,  mobile phone +41 79 302 3262
>
> <bcar@ch.ibm.com> for IBM business
> <brian.e.carpenter@gmail.com> for IETF business
> ----- Forwarded by Brian Carpenter/Switzerland/IBM on 2007-05-22 17:20
> -----
>
> -------- Original Message --------
> Subject: RE: Recondite HTTP question
> Date: Tue, 22 May 2007 06:18:12 -0700
> From: Larry Masinter <masinter@adobe.com>
> To: Brian E Carpenter <brian.e.carpenter@gmail.com>
>
> There's a live effort to update the http spec and dea with issues like
> this.  How about  bringing this up ietf-http-wg@w3.org?
>
>
>
>
>
>   -----Original Message-----
> From:              Brian E Carpenter  
> [mailto:brian.e.carpenter@gmail.com]
> Sent:        Tuesday, May 22, 2007 12:14 AM Pacific Standard Time
> To:          Larry Masinter
> Subject:           Recondite HTTP question
>
> Larry,
>
> Question from a colleague:
>
> RFC 2616 seems to allow for both of these:
>     Content-type=application/xml; charset="utf-8"
>     Content-type=application/xml; charset=utf-8
> In your view, are both valid? A narrow interpretation suggests
> that only the second one is formally bound to the IANA charset
> registry.
>
> Thanks
>
>        Brian
> --
> NEW: Preferred email for non-IBM matters: brian.e.carpenter@gmail.com
>
>
>


--
Mark Nottingham     http://www.mnot.net/

Received on Tuesday, 12 June 2007 12:43:35 UTC