Re: #320: add advice on defining auth scheme parameters from Yutaka OIWA on 2011-10-31 (ietf-http-wg@w3.org from October to December 2011)

From: Yutaka OIWA <yutaka@oiwa.jp>
Date: Tue, 01 Nov 2011 08:25:40 +0900
To: Julian Reschke <julian.reschke@gmx.de>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4EAF2E74.5010309@aist.go.jp>
Dear Julian,
thank you for detailed and prompt response.
Please allow me to answer in a random order.

2011/11/1 Julian Reschke <julian.reschke@gmx.de>:

>> No, in XML cases, single quotes and double quotes has been clearly
>> defined to be two representations of the same thing.
>> There is a clear semantic definitions, a definition how to compare these
>> twos,
>> and a definition that anyone should not distinguish between these.
>
> And I believe that we need to make this statement for parameters in HTTP
> header fields as well.

Hmm, this makes sense.  Current "equivalence" rule seems too implicit for me.

> In P2, we say:
>
> "Many header fields use a format including (case-insensitively) named
> parameters (for instance, Content-Type, defined in Section 6.8 of [Part3]).
> Allowing both unquoted (token) and quoted (quoted-string) syntax for the
> parameter value enables recipients to use existing parser components. When
> allowing both forms, the meaning of a parameter value ought to be
> independent of the syntax used for it (for an example, see the notes on
> parameter handling for media types in Section 2.3 of [Part3])."

How about rephrasing this to something about:
  Many header fields use a format including (case-insensitively) named
  parameters (for instance, Content-Type, defined in Section 6.8 of [Part3]).
  It should be aware that many existing parser components
  do not distinguish unquoted (token) and quoted (quoted-string) syntax
  for parameter values. Therefore,
  whenever defining a new parameter, the meaning of a parameter value
  SHOULD NOT be dependent of the syntax used for it
  (for an example, see the notes on parameter handling for media types
  in Section 2.3 of [Part3]).
  << Receivers are RECOMMENDED to tolerate both forms of parameters
  interchangeably. >> (this << >> may or may not be included)

Technically speaking,
To enable "recipients to use existing parser components",
the most important thing is the third sentence of the above paragraph.
If we used 1 and "1" in different and defined meanings,
it will break such a parser.
I agree this direction is correct, under the "Postel's principle".

# I assume that a token in realm is currently an "undefined behavior",
# allowing receivers to treat it as if it were a string.

Oppositely, Sending side does not need to send out both forms randomly,
and specifications do not need to strongly certify both.
Allowing sender-side to sending string unquoted is a bad idea,
as it always gets complex and is bug-prone when special characters are found
to be sent.  If any implementations have a feature correctly quoting any
non-token strings, they can just send a quoted string anytime.  So,
My opinion is still to require "normative forms" for the sender's side,
also as par the Postel's principle.
# So, if we had the above << >> sentence, the "realm "=" quoted-string"
# rule is not a bad thing, I think (now read as the sender's principle).

>> Hmm, I don't agree with this idea, actually.
>> Tokens (meanings usually defined for each token except general integers
>> etc.)
>> has a distinct semantics than strings in general.
>
> I disagree. It's just a syntactical difference.

Personally still disagreeing, but it is just the matter of definition.
If we explicitly make it clear for HTTP/1.1 BIS and future,
I will follow it for future (including current drafts).

But one thing to be considered: case insensitivity.
Tokens in RHS are often (not always?) case insensitive, and
strings are mostly (always?) case sensitive.
If we say, for example, "strings to be accepted whenever tokens
are requested", it is OK for me.  But I am a bit still hesitate
about saying "tokens are just special cases of unquoted strings."

e.g. in RFC2617 Digest, the parameter "stale" is explicitly case insensitive.
"qop" and others are implicitly insensitive (by a text in RFC2616, Sec 2.1),
but the naming of LHEX (lower hex?) rule confuses me for nc-value.

> Do you have an example where this interpretation breaks current
> implementations?

Personally YES, but because mine is not in the wild, I can change it now :-)
I briefly checked sources of a few Digest implementations,
and all of them works with this change.
(But we may need more interop check for Digest implementations...)

-- 
Yutaka OIWA, Ph.D.                                       Research Scientist
                            Research Center for Information Security (RCIS)
    National Institute of Advanced Industrial Science and Technology (AIST)
                      Mail addresses: <y.oiwa@aist.go.jp>, <yutaka@oiwa.jp>
OpenPGP: id[440546B5] fp[7C9F 723A 7559 3246 229D  3139 8677 9BD2 4405 46B5]
Received on Monday, 31 October 2011 23:33:10 UTC