W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2007

charset / Content-Type BNF flaw in RFC2616

From: Andreas Maier <MAIERA@de.ibm.com>
Date: Wed, 23 May 2007 13:41:09 +0200
To: ietf-http-wg@w3.org
Cc: Brian Carpenter <brian.e.carpenter@gmail.com>, Larry Masinter <masinter@adobe.com>, David Singer <singer@almaden.ibm.com>
Message-ID: <OF0F79F4C4.48F60CCC-ONC12572E4.004020CE-C12572E4.00403138@de.ibm.com>

... resending after getting subscribed to the list ...

In our group, we encountered a flaw in RFC2616 as part of investigating an
interoperability problem between a client and a server component that use
the CIM-XML over HTTP protocol. That protocol is used widely in the
industry for systems management and is owned by the DMTF standards org

I'd like to bring the flaw in RFC2616 to your attention, with the goal of
folding this into a possibly upcoming update on that RFC (see Larry's mail
below). I am not up to date as to where to report such things, so you need
to help me a little to get this directed the right way (e.g. towards the
activity Larry was mentioning).

Here is a description of the flaw:

RFC2616 defines the "charset" parameter of the "Content-Type" header partly
using BNF, and partly in text. When following the BNF dependency tree of
the "Content-Type" production (defined in section 14.17), it uses the
"media-type" production (defined in section 3.7), which uses the
"parameter" production (defined in section 3.6) which allows both an
unquoted "token" and a "quoted-string". So far, this allows both quoted and
unquoted forms for the value of the "charset" parameter of "Content-Type",
as in the following two examples (which are from how the CIM-XML over HTTP
protocol uses Content-Type):

  Content-type: application/xml; charset="utf-8"
  Content-type: application/xml; charset=utf-8

However, there is also the following text in section 3.4:

   --- begin of text ---
   HTTP character sets are identified by case-insensitive tokens. The
   complete set of tokens is defined by the IANA Character Set registry

       charset = token

   Although HTTP allows an arbitrary token to be used as a charset
   value, any token that has a predefined value within the IANA
   Character Set registry [19] MUST represent the character set defined
   by that registry. Applications SHOULD limit their use of character
   sets to those defined by the IANA registry
   --- end of text ---

This text adds to the BNF defined syntax rules by defining a recommendation
to use the IANA defined character sets in an unquoted form.

The interoperability problem we encountered was caused by a silent
agreement in the CIM-XML community that the quoted form is to be used,
while the one CIM server that ran into the interoperability issue obviously
has read RFC2616 better than the rest of the CIM-XML community and required
the form without quotes. We plan to fix that in our CIM-XML spec by
recommending to use the unquoted form on the sending side, and to support
both forms on the receiving side.

Back to the flaw in RFC2616. The flaw is that the BNF definition of the
"Content-Type" production does not utiilize the "charset" production
defined in section 3.4, and therefore an occasional reader of RFC2616 who
follows the BNF dependencies, does not necessarily notice section 3.4 and
hence arrives at the conclusion that the quoted and unquoted form are both
equally ok. Which is what happened to me ;-)

I suggest to fix this by utilizing the "charset" production somewhere in
the "Content-Type" production. Maybe at the level of "media-type". In
addition, an explicit reference to section 3.4 could be added to the
description of Content-Type in section 14.17.

I believe that specs like RFC2616 are not always read top to bottom in one
flow, but are often used as a reference to answer particular questions, and
this change would improve the capability of RFC2616 to allow for that.


Andreas Maier
IBM Senior Technical Staff Member, Systems Management Architecture & Design
IBM Development Laboratory Boeblingen, Germany
maiera@de.ibm.com, +49-7031-16-3654

IBM Deutschland Entwicklung GmbH; Geschaeftsfuehrung: Herbert Kircher;
Vorsitzender des Aufsichtsrats: Martin
Jetter, Sitz der Gesellschaft: Boeblingen, Registergericht: Amtsgericht
Stuttgart, HRB 243294
----- Forwarded by Andreas Maier/Germany/IBM on 05/23/2007 11:29 -----
             land/IBM@IBMCH                                             To 
                                       Andreas Maier/Germany/IBM@IBMDE     
             05/22/2007 17:22                                           cc 
                                       Recondite HTTP question             


I think you're correct, and the message below answers "is there any place
to report this to ?"


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Brian E Carpenter
Distinguished Engineer, Internet Standards & Technology, IBM STG
Based in Switzerland,  mobile phone +41 79 302 3262

<bcar@ch.ibm.com> for IBM business
<brian.e.carpenter@gmail.com> for IETF business
----- Forwarded by Brian Carpenter/Switzerland/IBM on 2007-05-22 17:20

-------- Original Message --------
Subject: RE: Recondite HTTP question
Date: Tue, 22 May 2007 06:18:12 -0700
From: Larry Masinter <masinter@adobe.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>

There's a live effort to update the http spec and dea with issues like
this.  How about  bringing this up ietf-http-wg@w3.org?

  -----Original Message-----
From:              Brian E Carpenter [mailto:brian.e.carpenter@gmail.com]
Sent:        Tuesday, May 22, 2007 12:14 AM Pacific Standard Time
To:          Larry Masinter
Subject:           Recondite HTTP question


Question from a colleague:

RFC 2616 seems to allow for both of these:
    Content-type=application/xml; charset="utf-8"
    Content-type=application/xml; charset=utf-8
In your view, are both valid? A narrow interpretation suggests
that only the second one is formally bound to the IANA charset


NEW: Preferred email for non-IBM matters: brian.e.carpenter@gmail.com
Received on Wednesday, 23 May 2007 11:41:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:50:09 GMT