OWS, line-folding and quoted-string

From: Yutaka OIWA <y.oiwa@aist.go.jp>
Date: Tue, 06 Sep 2011 11:45:53 +0900
To: HTTP Working Group <ietf-http-wg@w3.org>
Dear all,

I have a question with the current definition of quoted-string in p1-16.

In section 1.2.2, OWS is defined as *( [ obs-fold ] WSP ), allowing
null-string match.  The text in the same section specifies that
non-null OWS can be transformed to single SP within field-content.

At the same time, in section 3.2.3, quoted string is defined as
DQUOTE *( qdtext / quoted-pair ) DQUOTE, where qdtext contains single OWS.
(Note that quoted-string is to be used inside field-content.)

This brings two unwanted consequences:

1) Using null-allowing OWS inside infinitely-repeating qdtext
   makes any quoted-strings to be parsed as infinitely many
   possibilities unneededly.

2) As a single OWS can eat two+ spaces at once,
   and as a side-effect of the OWS canonicalization in Sec. 1.2.2,
   continuous spaces in the quoted-string may be reduced to
   any (non-zero) number of spaces not more than the original.
   For example, five spaces may be parsed as
   ("  ": OWS) (" ": OWS) ("  ": OWS), which may be reduced to
   three spaces after "Sec 1.2.2 SHOULD rule" applied.
   This will bring unwanted ambiguity and bad interoperability
   especially with hash-or-crypto-based authentications.
   It also contradicts with what people thinks with "quoting".

So, I propose change to the definition of qdtext in section 3.2.3 as follows:
   qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
   qdtext = [ obs-fold ] WSP / %x21 / %x23-5B / %x5D-7E / obs-text
(or add parentheses if clarification needed).

As Section 3.2.1 separately specifies reduction of obs-fold to either
one or two SPs, it still allows removal of line folding within quoted-string.
I think the interoperability problem with line-folded quoted-string
seems to be negligible (because line-folded quoted-string is a very bad thing

The literal reading of "3.2.1 obs-fold rule" says that a line-folding
"CR LF SP" should be reduced to either "SP SP" or "SP SP SP"
(because CRLF is reduced to single SP).  Is this correct and intended?
I guess the intention of the first alternative is "SP" instead of "SP SP".
As obs-fold is guaranteed to be followed by WSP, it can be simply removed
instead of replacing to single SP.
(This is a minor issue, however, because in many places it will be further
reduced by OWS/RWS reduction rule in Sec 1.2.2.)

# I personally prefer obs-fold to be defined as "CRLF 1*WSP" because
# it clearly says that continued-line must be started with spaces,
# but changing at this moment seems to be inadequate,
# since adoption requires rewriting of many rules using obs-fold.

The proposed change above affects handling of HT in a quoted-string.
Is that better to be reduced to SP or to be kept as HT?

