Re: ABNF switch: list rules from Julian Reschke on 2008-05-23 (ietf-http-wg@w3.org from April to June 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 23 May 2008 15:19:09 +0200
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4836C44D.8070802@gmx.de>

Bjoern Hoehrmann wrote:
> * Julian Reschke wrote:
>> Let's take an example, such as Accept-Charset:
>>
>>   Accept-Charset = "Accept-Charset" ":"
>>           1#( ( charset | "*" ) [ ";" "q" "=" qvalue ] )
>>
>> (<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-02.html#rfc.section.6.2>)
>>
>> A mechanical translation would yield:
>>
>>   Accept-Charset = "Accept-Charset" ":"
>>                  ( *LWS ( charset / "*" ) [ ";q=" qvalue ]
>>                 *( *LWS "," *LWS ( charset / "*" ) [ ";q=" qvalue ] ) )
>>
>> (hopefully).
> 
> There are several differences here in what values the two allow; you did
> not call them out so I am not sure whether they are intentional. In par-
> ticular these are valid under the old production but not under yours:
> 
>   Accept-Charset: utf-8,,*

Good catch.

It seems to me that

"Wherever this construct is used, null elements are allowed, but do not 
contribute to the count of elements present. That is, "(element), , 
(element) " is permitted, but counts as only two elements. Therefore, 
where at least one element is required, at least one non-null element 
MUST be present. Default values are 0 and infinity so that "#element" 
allows any number, including zero; "1#element" requires at least one; 
and "1#2element" allows one or two."

doesn't translate well into ABNF syntax. So even if we said:

    Accept-Charset = "Accept-Charset" ":"
              ( *LWS ( charset / "*" ) [ ";q=" qvalue ]
             *( *LWS "," *LWS [ ( charset / "*" ) [ ";q=" qvalue ] ] ) )

that would allow

   Accept-Charset: utf-8,,*

but not

   Accept-Charset: ,,utf-8

which is valid in RFC2616. So we need to handle leading ("," *LWS) 
separately...

>   Accept-Charset: utf-8 ; q = ...

That's mistake I made when pasting bap's output back into the mail (bap 
doesn't know about implied LWS). So it should have read:

Accept-Charset = "Accept-Charset" ":"
             ( *LWS ( charset / "*" ) [ ";" "q" "=" qvalue ]
            *( *LWS "," *LWS ( charset / "*" ) [ ";" "q" "=" qvalue ] ) )

> I'm not sure whether your new production should be read assuming implied
> linear white space, if not there are a number of additional differences,
> and if so, then the production is more complex than would be necessary.

I was trying to get the list rule issue resolved first; of course the 
implied LWS needs to be resolved as well.

> It would certainly be wise to factor repeated productions out into sepa-
> rate productions, yes.

So, combining this, but still ignoring implied LWS, we'd get:

   AC-f = ( ( charset | "*" )[ ";" "q" "=" qvalue ] )
   AC-e = *LWS AC-f

   Accept-Charset = "Accept-Charset" ":" *( *LWS "," ) AC-e *( *LWS "," 
[ AC-e ])

Or...

   AC-f = ( ( charset | "*" )[ ";" "q" "=" qvalue ] )
   AC-e = *LWS AC-f
   COMMA = *LWS ","

   Accept-Charset = "Accept-Charset" ":" *COMMA AC-e *( COMMA [ AC-e ])

The more I look into this, the better the original syntax looks :-)

BR, Julian

Received on Friday, 23 May 2008 13:19:57 UTC