OWS, line-folding and quoted-string

From: Yutaka OIWA <y.oiwa@aist.go.jp>
Date: Tue, 06 Sep 2011 11:45:53 +0900
Message-ID: <4E658961.3020004@aist.go.jp>
To: HTTP Working Group <ietf-http-wg@w3.org>
Dear all,

I have a question with the current definition of quoted-string in p1-16.

In section 1.2.2, OWS is defined as *( [ obs-fold ] WSP ), allowing
null-string match.  The text in the same section specifies that
non-null OWS can be transformed to single SP within field-content.

At the same time, in section 3.2.3, quoted string is defined as
DQUOTE *( qdtext / quoted-pair ) DQUOTE, where qdtext contains single OWS.
(Note that quoted-string is to be used inside field-content.)

This brings two unwanted consequences:

1) Using null-allowing OWS inside infinitely-repeating qdtext
   makes any quoted-strings to be parsed as infinitely many
   possibilities unneededly.

2) As a single OWS can eat two+ spaces at once,
   and as a side-effect of the OWS canonicalization in Sec. 1.2.2,
   continuous spaces in the quoted-string may be reduced to
   any (non-zero) number of spaces not more than the original.
   For example, five spaces may be parsed as
   ("  ": OWS) (" ": OWS) ("  ": OWS), which may be reduced to
   three spaces after "Sec 1.2.2 SHOULD rule" applied.
   This will bring unwanted ambiguity and bad interoperability
   especially with hash-or-crypto-based authentications.
   It also contradicts with what people thinks with "quoting".

So, I propose change to the definition of qdtext in section 3.2.3 as follows:
   qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
   qdtext = [ obs-fold ] WSP / %x21 / %x23-5B / %x5D-7E / obs-text
(or add parentheses if clarification needed).

As Section 3.2.1 separately specifies reduction of obs-fold to either
one or two SPs, it still allows removal of line folding within quoted-string.
I think the interoperability problem with line-folded quoted-string
seems to be negligible (because line-folded quoted-string is a very bad thing

The literal reading of "3.2.1 obs-fold rule" says that a line-folding
"CR LF SP" should be reduced to either "SP SP" or "SP SP SP"
(because CRLF is reduced to single SP).  Is this correct and intended?
I guess the intention of the first alternative is "SP" instead of "SP SP".
As obs-fold is guaranteed to be followed by WSP, it can be simply removed
instead of replacing to single SP.
(This is a minor issue, however, because in many places it will be further
reduced by OWS/RWS reduction rule in Sec 1.2.2.)

# I personally prefer obs-fold to be defined as "CRLF 1*WSP" because
# it clearly says that continued-line must be started with spaces,
# but changing at this moment seems to be inadequate,
# since adoption requires rewriting of many rules using obs-fold.

The proposed change above affects handling of HT in a quoted-string.
Is that better to be reduced to SP or to be kept as HT?

Yutaka OIWA, Ph.D.                                       Research Scientist
                            Research Center for Information Security (RCIS)
    National Institute of Advanced Industrial Science and Technology (AIST)
                      Mail addresses: <y.oiwa@aist.go.jp>, <yutaka@oiwa.jp>
OpenPGP: id[995DD3E1] fp[3C21 17D0 D953 77D3 02D7 4FEC 4754 40C1 995D D3E1]
