- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 25 Aug 2009 14:47:12 +0200
- To: Henrik Nordstrom <henrik@henriknordstrom.net>
- CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>
Henrik Nordstrom wrote:
> Should probably change topic here, but it's still relevant so keeping
> the issue topic. Most of this is taking a more generic view of
> quoted-pair, not isolated to chunk extension values.
>
> tis 2009-08-25 klockan 09:11 +0200 skrev Julian Reschke:
>
>> quoted-pair is also used in comments. Are we ok with restricting the set
>> here as well? And, if yes, shouldn't we then also adjust the allowed set
>> for non-quoted characters in comments?
>
> What? Restricting how? I thought we were talking about restricting the
> use of CTLs?
Yes. I wanted to confirm that we do that for quoted-strings *and*
comments. Do we?
> Now some further rambling on the use of quoted-pair and the difficulties
> this causes for parsers:
>
>
> qdtext is for text within a quoted-string, and MUST NOT include '"' or
> '\'. Those two must be produced as quoted-pair to be used within a
> quoted-string.
>
> qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
> ; OWS / <VCHAR except DQUOTE and "\"> / obs-text
>
> ctext is the same but for comment, and MUST NOT include '(', ')' or '\'.
> Those three must be produced as quoted-pair to be used within a comment.
>
> ctext = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text
> ; OWS / <VCHAR except "(", ")", and "\"> / obs-text
>
> Neither of qdtext or ctext allows for CTLs, except for HT or obsoleted
> CRLF folding (from OWS).
Yes. But quoted-string and comment allow quoted-pair which currently
does allow CTLs.
> Specifications (2616) is very strict on where quoted-pair is alowed to
> be used, but it's at the same time very subtle where those areas are
> creating a large grey area where parsing is somewhat non-obvious.
>
> It's the same question as been raised earlier regarding comments. A
> construct looking like a comment is only a comment if the header in
> question is defined to allow comments, if not it's literally part of the
> header value.
>
> Quoted-string is also only quoted-string if the header in question is
> defined to accept quoted-string, if not it may be a literal part of the
> header value even if it may look like a quoted-string (for a header
> defined as taking *TEXT as value, 2616 has no such headers however)
>
> RFC2616 BNF and relevant comments:
>
> generic-message = start-line
> *(message-header CRLF)
> CRLF
> [ message-body ]
> message-header = field-name ":" [ field-value ]
> field-name = token
> field-value = *( field-content | LWS )
> field-content = <the OCTETs making up the field-value
> and consisting of either *TEXT or combinations
> of token, separators, and quoted-string>
>
> TEXT = <any OCTET except CTLs,
> but including LWS>
>
> A CRLF is allowed in the definition of TEXT only as part of a header
> field continuation.
>
> Comments can be included in some HTTP header fields by surrounding
> the comment text with parentheses. Comments are only allowed in
> fields containing "comment" as part of their field value definition.
> In all other fields, parentheses are considered part of the field
> value.
>
> comment = "(" *( ctext | quoted-pair | comment ) ")"
> ctext = <any TEXT excluding "(" and ")">
>
> The allowable characters in *TEXT overlaps completely with token,
> separators and quoted-string in the allowable characters except that
> *TEXT do not allow CTLs other than LWS (HT), and within *TEXT the '\'
> character have no special meaning.
>
> Which means that to properly parse '\' quoted constructs one must know
> in detail every header processed in order to know if the '\' is quoting
> the next character or if it's just a literal '\'.
Yes.
> Because of this it's important that the overall message parsing is the
> same regardless if quoted-pair is processed or not, only producing
> slightly different results in the raw header value. Or put in other
> words, it needs to be possible to completely defer quoting and comment
> processing until the header value as such is examined in detail, with
> general message parsing using *TEXT for all header values. And for chunk
> headers *TEXT minus folding for the general message format, only needing
> to dive into quoting etc when eventually processing the chunk extension
> values (if at all).
>
>
> Regarding the allowable characters there imho is absolutely no need to
> allow for control characters anywhere in HTTP headers or chunk headers,
> quoted or not, and it's additionally very very likely many parsers will
> fail on such constructs making them quite non-interoperable.
Agreed.
> And additionally if restricting the allowed set of quoted characters to
> exclude \x00, NL and CR as already done in HTTPbis then it becomes very
> questionable from a technical point of view (ignoring parsing) to allow
> the use of other CTLs in quoted form. The use of having CTLs in header
> values is very limited to begin with, basically only needed to support
> transmission of (non-UTF8) multibyte charactersets or binary non-text
> data, in which case having those three excluded is already a signifcant
> issue for such use.
Yes.
> So imho quoted-pair should be
>
> quoted-text = %x09 / %x20-%x7E / obs-text
> ; WSP / VCHAR / obs-text
> quoted-pair = "\" qchar
>
> to match the use of *TEXT in 2616, making comments and quoted strings
> all fit within *TEXT as those constructs is only used in detailed forms
> which should be a subset of the more generic *TEXT.
"qchar" being...?
> This reasoning is also consistent with the current field-content
> definition using VTEXT etc..
>
> field-value = *( field-content / OWS )
> field-content = *( WSP / VCHAR / obs-text )
>
> This field-content definition DOES NOT allow for CTLs other than HT.
> Allowing quoted-pair to include CTLs other than HT is incompatible with
> the above (from latest p1) definition of field-content.
>
> If you look closely you'll notice the quoted-text and field-contents
> definitions above are equal. Perhaps a common term should be defined for
> that similar to the *TEXT element used in 2616. There is probably more
> places where using said term would make sense. And sorry, no I do not
> have a good suggested BNF name for this construct.. TEXT would be
> confusing with 2616 and text in lower case too generic to be used in
> describing text. general-text?
> ...
"characters"?
Anyway, my take away from your analysis is: "yes, CTLs need to be
disallowed both in comments and quoted-text", right?
BR, julian
Received on Tuesday, 25 August 2009 12:47:56 UTC