Thought on ABNF upgrade


here are some thoughts for Issue 36, the upgrade of the ABNF syntax (see 

1) I think there is agreement that the ABNF syntax needs to be upgraded 
to the syntax defined in RFC4234.

2) What's not so clear is what exact format we want to use. Currently, 
RFC2616 uses a format based on RFC822 (see 
<>)m but 
augments it with a special "list" syntax, which makes the notation for 
many headers much easier to read (see 
<>). We 
need to decide whether we want to keep that extension in RFC2616bis, or 
just live with the RFC4234 syntax.

3) Related to that, RFC2616 defines special treatment of Linear White 
Space (LWS), see 
It's up for discussion whether we want to keep that aspect. See 
for some comments from Roy Fielding about that subject.

So how do we proceed?

I think it would be extremely useful to be able to extract & parse the 
embedded ABNF productions, generating somewhat normalized output. That 
would allow us to ensure that while we are updating the spec, we don't 
break the grammar (or actually, that we do not introduce *unintented* 

Extraction is no problem. The XML source 
already has the ABNF fragments marked up accordingly. A tool for 
extraction is available in my rfc2629.xslt package 
-- I'm attaching the output.

The next step would be to run this through a parser. Here's where we 
have the first problems:

- we'd need a parser for the RFC2616 variant of the ABNF syntax (does 
anybody have that?), and

- the ABNF itself must parse. Currently there's at least one problem with

     qdtext         = <any TEXT except <">>

which IMHO needs to be fixed not to contain the literals "<" and ">". I 
would like to make that change with the next draft we submit.

Optimally, we would then find a parser that will accept both 
rfc2616-style ABNF and RFC4234-style ABNF, and which will re-serialize 
into a single RFC4234-based format that we can then use to verify changes.

Best regards, Julian
    OCTET          = <any 8-bit sequence of data>
    CHAR           = <any US-ASCII character (octets 0 - 127)>
    UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
    LOALPHA        = <any US-ASCII lowercase letter "a".."z">
    ALPHA          = UPALPHA | LOALPHA
    DIGIT          = <any US-ASCII digit "0".."9">
    CTL            = <any US-ASCII control character
                     (octets 0 - 31) and DEL (127)>
    CR             = <US-ASCII CR, carriage return (13)>
    LF             = <US-ASCII LF, linefeed (10)>
    SP             = <US-ASCII SP, space (32)>
    HT             = <US-ASCII HT, horizontal-tab (9)>
    <">            = <US-ASCII double-quote mark (34)>

    CRLF           = CR LF

    LWS            = [CRLF] 1*( SP | HT )

    TEXT           = <any OCTET except CTLs,
                     but including LWS>

    HEX            = "A" | "B" | "C" | "D" | "E" | "F"
                   | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT

    token          = 1*<any CHAR except CTLs or separators>
    separators     = "(" | ")" | "<" | ">" | "@"
                   | "," | ";" | ":" | "\" | <">
                   | "/" | "[" | "]" | "?" | "="
                   | "{" | "}" | SP | HT

    comment        = "(" *( ctext | quoted-pair | comment ) ")"
    ctext          = <any TEXT excluding "(" and ")">

    quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
    qdtext         = <any TEXT except <">>

    quoted-pair    = "\" CHAR

       HTTP-Version   = "HTTP" "/" 1*DIGIT "." 1*DIGIT

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

    HTTP-date    = rfc1123-date | rfc850-date | asctime-date
    rfc1123-date = wkday "," SP date1 SP time SP "GMT"
    rfc850-date  = weekday "," SP date2 SP time SP "GMT"
    asctime-date = wkday SP date3 SP time SP 4DIGIT
    date1        = 2DIGIT SP month SP 4DIGIT
                   ; day month year (e.g., 02 Jun 1982)
    date2        = 2DIGIT "-" month "-" 2DIGIT
                   ; day-month-year (e.g., 02-Jun-82)
    date3        = month SP ( 2DIGIT | ( SP 1DIGIT ))
                   ; month day (e.g., Jun  2)
    time         = 2DIGIT ":" 2DIGIT ":" 2DIGIT
                   ; 00:00:00 - 23:59:59
    wkday        = "Mon" | "Tue" | "Wed"
                 | "Thu" | "Fri" | "Sat" | "Sun"
    weekday      = "Monday" | "Tuesday" | "Wednesday"
                 | "Thursday" | "Friday" | "Saturday" | "Sunday"
    month        = "Jan" | "Feb" | "Mar" | "Apr"
                 | "May" | "Jun" | "Jul" | "Aug"
                 | "Sep" | "Oct" | "Nov" | "Dec"

    delta-seconds  = 1*DIGIT

    charset = token

    content-coding   = token

    transfer-coding         = "chunked" | transfer-extension
    transfer-extension      = token *( ";" parameter )

    parameter               = attribute "=" value
    attribute               = token
    value                   = token | quoted-string

    Chunked-Body   = *chunk

    chunk          = chunk-size [ chunk-extension ] CRLF
                     chunk-data CRLF
    chunk-size     = 1*HEX
    last-chunk     = 1*("0") [ chunk-extension ] CRLF

    chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
    chunk-ext-name = token
    chunk-ext-val  = token | quoted-string
    chunk-data     = chunk-size(OCTET)
    trailer        = *(entity-header CRLF)

    media-type     = type "/" subtype *( ";" parameter )
    type           = token
    subtype        = token

    product         = token ["/" product-version]
    product-version = token

    qvalue         = ( "0" [ "." 0*3DIGIT ] )
                   | ( "1" [ "." 0*3("0") ] )

     language-tag  = primary-tag *( "-" subtag )
     primary-tag   = 1*8ALPHA
     subtag        = 1*8ALPHA

   entity-tag = [ weak ] opaque-tag
   weak       = "W/"
   opaque-tag = quoted-string

   range-unit       = bytes-unit | other-range-unit
   bytes-unit       = "bytes"
   other-range-unit = token

    HTTP-message   = Request | Response     ; HTTP/1.1 messages

     generic-message = start-line
                       *(message-header CRLF)
                       [ message-body ]
     start-line      = Request-Line | Status-Line

    message-header = field-name ":" [ field-value ]
    field-name     = token
    field-value    = *( field-content | LWS )
    field-content  = <the OCTETs making up the field-value
                     and consisting of either *TEXT or combinations
                     of token, separators, and quoted-string>

    message-body = entity-body
                 | <entity-body encoded as per Transfer-Encoding>

    general-header = Cache-Control            ; Section 14.9
                   | Connection               ; Section 14.10
                   | Date                     ; Section 14.18
                   | Pragma                   ; Section 14.32
                   | Trailer                  ; Section 14.40
                   | Transfer-Encoding        ; Section 14.41
                   | Upgrade                  ; Section 14.42
                   | Via                      ; Section 14.45
                   | Warning                  ; Section 14.46

     Request       = Request-Line              ; Section 5.1
                     *(( general-header        ; Section 4.5
                      | request-header         ; Section 5.3
                      | entity-header ) CRLF)  ; Section 7.1
                     [ message-body ]          ; Section 4.3

     Request-Line   = Method SP Request-URI SP HTTP-Version CRLF

    Method         = "OPTIONS"                ; Section 9.2
                   | "GET"                    ; Section 9.3
                   | "HEAD"                   ; Section 9.4
                   | "POST"                   ; Section 9.5
                   | "PUT"                    ; Section 9.6
                   | "DELETE"                 ; Section 9.7
                   | "TRACE"                  ; Section 9.8
                   | "CONNECT"                ; Section 9.9
                   | extension-method
    extension-method = token

    Request-URI    = "*" 
                   | absoluteURI 
                   | abs_path [ "?" query ] 
                   | authority

    request-header = Accept                   ; Section 14.1
                   | Accept-Charset           ; Section 14.2
                   | Accept-Encoding          ; Section 14.3
                   | Accept-Language          ; Section 14.4
                   | Authorization            ; Section 14.8
                   | Expect                   ; Section 14.20
                   | From                     ; Section 14.22
                   | Host                     ; Section 14.23
                   | If-Match                 ; Section 14.24
                   | If-Modified-Since        ; Section 14.25
                   | If-None-Match            ; Section 14.26
                   | If-Range                 ; Section 14.27
                   | If-Unmodified-Since      ; Section 14.28
                   | Max-Forwards             ; Section 14.31
                   | Proxy-Authorization      ; Section 14.34
                   | Range                    ; Section 14.35
                   | Referer                  ; Section 14.36
                   | TE                       ; Section 14.39
                   | User-Agent               ; Section 14.43

    Response      = Status-Line               ; Section 6.1
                    *(( general-header        ; Section 4.5
                     | response-header        ; Section 6.2
                     | entity-header ) CRLF)  ; Section 7.1
                    [ message-body ]          ; Section 7.2

    Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

   Status-Code    =
         "100"  ; Section 10.1.1: Continue
       | "101"  ; Section 10.1.2: Switching Protocols
       | "200"  ; Section 10.2.1: OK
       | "201"  ; Section 10.2.2: Created
       | "202"  ; Section 10.2.3: Accepted
       | "203"  ; Section 10.2.4: Non-Authoritative Information
       | "204"  ; Section 10.2.5: No Content
       | "205"  ; Section 10.2.6: Reset Content
       | "206"  ; Section 10.2.7: Partial Content
       | "300"  ; Section 10.3.1: Multiple Choices
       | "301"  ; Section 10.3.2: Moved Permanently
       | "302"  ; Section 10.3.3: Found
       | "303"  ; Section 10.3.4: See Other
       | "304"  ; Section 10.3.5: Not Modified
       | "305"  ; Section 10.3.6: Use Proxy
       | "307"  ; Section 10.3.8: Temporary Redirect
       | "400"  ; Section 10.4.1: Bad Request
       | "401"  ; Section 10.4.2: Unauthorized
       | "402"  ; Section 10.4.3: Payment Required
       | "403"  ; Section 10.4.4: Forbidden
       | "404"  ; Section 10.4.5: Not Found
       | "405"  ; Section 10.4.6: Method Not Allowed
       | "406"  ; Section 10.4.7: Not Acceptable
       | "407"  ; Section 10.4.8: Proxy Authentication Required
       | "408"  ; Section 10.4.9: Request Time-out
       | "409"  ; Section 10.4.10: Conflict
       | "410"  ; Section 10.4.11: Gone
       | "411"  ; Section 10.4.12: Length Required
       | "412"  ; Section 10.4.13: Precondition Failed
       | "413"  ; Section 10.4.14: Request Entity Too Large
       | "414"  ; Section 10.4.15: Request-URI Too Large
       | "415"  ; Section 10.4.16: Unsupported Media Type
       | "416"  ; Section 10.4.17: Requested range not satisfiable
       | "417"  ; Section 10.4.18: Expectation Failed
       | "500"  ; Section 10.5.1: Internal Server Error
       | "501"  ; Section 10.5.2: Not Implemented
       | "502"  ; Section 10.5.3: Bad Gateway
       | "503"  ; Section 10.5.4: Service Unavailable
       | "504"  ; Section 10.5.5: Gateway Time-out
       | "505"  ; Section 10.5.6: HTTP Version not supported
       | extension-code

   extension-code = 3DIGIT
   Reason-Phrase  = *<TEXT, excluding CR, LF>

    response-header = Accept-Ranges           ; Section 14.5
                    | Age                     ; Section 14.6
                    | ETag                    ; Section 14.19
                    | Location                ; Section 14.30
                    | Proxy-Authenticate      ; Section 14.33
                    | Retry-After             ; Section 14.37
                    | Server                  ; Section 14.38
                    | Vary                    ; Section 14.44
                    | WWW-Authenticate        ; Section 14.47

    entity-header  = Allow                    ; Section 14.7
                   | Content-Encoding         ; Section 14.11
                   | Content-Language         ; Section 14.12
                   | Content-Length           ; Section 14.13
                   | Content-Location         ; Section 14.14
                   | Content-MD5              ; Section 14.15
                   | Content-Range            ; Section 14.16
                   | Content-Type             ; Section 14.17
                   | Expires                  ; Section 14.21
                   | Last-Modified            ; Section 14.29
                   | extension-header

    extension-header = message-header

    entity-body    = *OCTET

    Accept         = "Accept" ":"
                     #( media-range [ accept-params ] )

    media-range    = ( "*/*"
                     | ( type "/" "*" )
                     | ( type "/" subtype )
                     ) *( ";" parameter )
    accept-params  = ";" "q" "=" qvalue *( accept-extension )
    accept-extension = ";" token [ "=" ( token | quoted-string ) ]

   Accept-Charset = "Accept-Charset" ":"
           1#( ( charset | "*" )[ ";" "q" "=" qvalue ] )

    Accept-Encoding  = "Accept-Encoding" ":"
                       1#( codings [ ";" "q" "=" qvalue ] )
    codings          = ( content-coding | "*" )

    Accept-Language = "Accept-Language" ":"
                      1#( language-range [ ";" "q" "=" qvalue ] )
    language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

       Accept-Ranges     = "Accept-Ranges" ":" acceptable-ranges
       acceptable-ranges = 1#range-unit | "none"

        Age = "Age" ":" age-value
        age-value = delta-seconds

       Allow   = "Allow" ":" #Method

       Authorization  = "Authorization" ":" credentials

   Cache-Control   = "Cache-Control" ":" 1#cache-directive

   cache-directive = cache-request-directive
        | cache-response-directive

   cache-request-directive =
          "no-cache"                          ; Section 14.9.1
        | "no-store"                          ; Section 14.9.2
        | "max-age" "=" delta-seconds         ; Section 14.9.3, 14.9.4
        | "max-stale" [ "=" delta-seconds ]   ; Section 14.9.3
        | "min-fresh" "=" delta-seconds       ; Section 14.9.3
        | "no-transform"                      ; Section 14.9.5
        | "only-if-cached"                    ; Section 14.9.4
        | cache-extension                     ; Section 14.9.6

    cache-response-directive =
          "public"                               ; Section 14.9.1
        | "private" [ "=" <"> 1#field-name <"> ] ; Section 14.9.1
        | "no-cache" [ "=" <"> 1#field-name <"> ]; Section 14.9.1
        | "no-store"                             ; Section 14.9.2
        | "no-transform"                         ; Section 14.9.5
        | "must-revalidate"                      ; Section 14.9.4
        | "proxy-revalidate"                     ; Section 14.9.4
        | "max-age" "=" delta-seconds            ; Section 14.9.3
        | "s-maxage" "=" delta-seconds           ; Section 14.9.3
        | cache-extension                        ; Section 14.9.6

   cache-extension = token [ "=" ( token | quoted-string ) ]

    Connection = "Connection" ":" 1#(connection-token)
    connection-token  = token

    Content-Encoding  = "Content-Encoding" ":" 1#content-coding

    Content-Language  = "Content-Language" ":" 1#language-tag

    Content-Length    = "Content-Length" ":" 1*DIGIT

    Content-Location = "Content-Location" ":"
                      ( absoluteURI | relativeURI )

     Content-MD5   = "Content-MD5" ":" md5-digest
     md5-digest   = <base64 of 128 bit MD5 digest as per [RFC1864]>

    Content-Range = "Content-Range" ":" content-range-spec

    content-range-spec      = byte-content-range-spec
    byte-content-range-spec = bytes-unit SP
                              byte-range-resp-spec "/"
                              ( instance-length | "*" )

    byte-range-resp-spec = (first-byte-pos "-" last-byte-pos)
                                   | "*"
    instance-length           = 1*DIGIT

    Content-Type   = "Content-Type" ":" media-type

    Date  = "Date" ":" HTTP-date

    ETag = "ETag" ":" entity-tag

   Expect       =  "Expect" ":" 1#expectation

   expectation  =  "100-continue" | expectation-extension
   expectation-extension =  token [ "=" ( token | quoted-string )
                            *expect-params ]
   expect-params =  ";" token [ "=" ( token | quoted-string ) ]

   Expires = "Expires" ":" HTTP-date

    From   = "From" ":" mailbox

    Host = "Host" ":" host [ ":" port ] ; Section 3.2.2

    If-Match = "If-Match" ":" ( "*" | 1#entity-tag )

    If-Modified-Since = "If-Modified-Since" ":" HTTP-date

    If-None-Match = "If-None-Match" ":" ( "*" | 1#entity-tag )

     If-Range = "If-Range" ":" ( entity-tag | HTTP-date )

   If-Unmodified-Since = "If-Unmodified-Since" ":" HTTP-date

    Last-Modified  = "Last-Modified" ":" HTTP-date

    Location       = "Location" ":" absoluteURI [ "#" fragment ]

    Max-Forwards   = "Max-Forwards" ":" 1*DIGIT

    Pragma            = "Pragma" ":" 1#pragma-directive
    pragma-directive  = "no-cache" | extension-pragma
    extension-pragma  = token [ "=" ( token | quoted-string ) ]

    Proxy-Authenticate  = "Proxy-Authenticate" ":" 1#challenge

    Proxy-Authorization     = "Proxy-Authorization" ":" credentials

    ranges-specifier = byte-ranges-specifier
    byte-ranges-specifier = bytes-unit "=" byte-range-set
    byte-range-set  = 1#( byte-range-spec | suffix-byte-range-spec )
    byte-range-spec = first-byte-pos "-" [last-byte-pos]
    first-byte-pos  = 1*DIGIT
    last-byte-pos   = 1*DIGIT

    suffix-byte-range-spec = "-" suffix-length
    suffix-length = 1*DIGIT

   Range = "Range" ":" ranges-specifier

    Referer        = "Referer" ":" ( absoluteURI | relativeURI )

    Retry-After  = "Retry-After" ":" ( HTTP-date | delta-seconds )

    Server         = "Server" ":" 1*( product | comment )

    TE        = "TE" ":" #( t-codings )
    t-codings = "trailers" | ( transfer-extension [ accept-params ] )

    Trailer  = "Trailer" ":" 1#field-name

  Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding

    Upgrade        = "Upgrade" ":" 1#product

    User-Agent     = "User-Agent" ":" 1*( product | comment )

    Vary  = "Vary" ":" ( "*" | 1#field-name )

   Via =  "Via" ":" 1#( received-protocol received-by [ comment ] )
   received-protocol = [ protocol-name "/" ] protocol-version
   protocol-name     = token
   protocol-version  = token
   received-by       = ( host [ ":" port ] ) | pseudonym
   pseudonym         = token

    Warning    = "Warning" ":" 1#warning-value

    warning-value = warn-code SP warn-agent SP warn-text
                                          [SP warn-date]

    warn-code  = 3DIGIT
    warn-agent = ( host [ ":" port ] ) | pseudonym
                    ; the name or pseudonym of the server adding
                    ; the Warning header, for use in debugging
    warn-text  = quoted-string
    warn-date  = <"> HTTP-date <">

    WWW-Authenticate  = "WWW-Authenticate" ":" 1#challenge

    MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

     content-disposition = "Content-Disposition" ":"
                           disposition-type *( ";" disposition-parm )
     disposition-type = "attachment" | disp-extension-token
     disposition-parm = filename-parm | disp-extension-parm
     filename-parm = "filename" "=" quoted-string
     disp-extension-token = token
     disp-extension-parm = token "=" ( token | quoted-string )

