- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 25 Aug 2009 14:47:12 +0200
- To: Henrik Nordstrom <henrik@henriknordstrom.net>
- CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>
Henrik Nordstrom wrote: > Should probably change topic here, but it's still relevant so keeping > the issue topic. Most of this is taking a more generic view of > quoted-pair, not isolated to chunk extension values. > > tis 2009-08-25 klockan 09:11 +0200 skrev Julian Reschke: > >> quoted-pair is also used in comments. Are we ok with restricting the set >> here as well? And, if yes, shouldn't we then also adjust the allowed set >> for non-quoted characters in comments? > > What? Restricting how? I thought we were talking about restricting the > use of CTLs? Yes. I wanted to confirm that we do that for quoted-strings *and* comments. Do we? > Now some further rambling on the use of quoted-pair and the difficulties > this causes for parsers: > > > qdtext is for text within a quoted-string, and MUST NOT include '"' or > '\'. Those two must be produced as quoted-pair to be used within a > quoted-string. > > qdtext = OWS / %x21 / %x23-5B / %x5D-7E / obs-text > ; OWS / <VCHAR except DQUOTE and "\"> / obs-text > > ctext is the same but for comment, and MUST NOT include '(', ')' or '\'. > Those three must be produced as quoted-pair to be used within a comment. > > ctext = OWS / %x21-27 / %x2A-5B / %x5D-7E / obs-text > ; OWS / <VCHAR except "(", ")", and "\"> / obs-text > > Neither of qdtext or ctext allows for CTLs, except for HT or obsoleted > CRLF folding (from OWS). Yes. But quoted-string and comment allow quoted-pair which currently does allow CTLs. > Specifications (2616) is very strict on where quoted-pair is alowed to > be used, but it's at the same time very subtle where those areas are > creating a large grey area where parsing is somewhat non-obvious. > > It's the same question as been raised earlier regarding comments. A > construct looking like a comment is only a comment if the header in > question is defined to allow comments, if not it's literally part of the > header value. > > Quoted-string is also only quoted-string if the header in question is > defined to accept quoted-string, if not it may be a literal part of the > header value even if it may look like a quoted-string (for a header > defined as taking *TEXT as value, 2616 has no such headers however) > > RFC2616 BNF and relevant comments: > > generic-message = start-line > *(message-header CRLF) > CRLF > [ message-body ] > message-header = field-name ":" [ field-value ] > field-name = token > field-value = *( field-content | LWS ) > field-content = <the OCTETs making up the field-value > and consisting of either *TEXT or combinations > of token, separators, and quoted-string> > > TEXT = <any OCTET except CTLs, > but including LWS> > > A CRLF is allowed in the definition of TEXT only as part of a header > field continuation. > > Comments can be included in some HTTP header fields by surrounding > the comment text with parentheses. Comments are only allowed in > fields containing "comment" as part of their field value definition. > In all other fields, parentheses are considered part of the field > value. > > comment = "(" *( ctext | quoted-pair | comment ) ")" > ctext = <any TEXT excluding "(" and ")"> > > The allowable characters in *TEXT overlaps completely with token, > separators and quoted-string in the allowable characters except that > *TEXT do not allow CTLs other than LWS (HT), and within *TEXT the '\' > character have no special meaning. > > Which means that to properly parse '\' quoted constructs one must know > in detail every header processed in order to know if the '\' is quoting > the next character or if it's just a literal '\'. Yes. > Because of this it's important that the overall message parsing is the > same regardless if quoted-pair is processed or not, only producing > slightly different results in the raw header value. Or put in other > words, it needs to be possible to completely defer quoting and comment > processing until the header value as such is examined in detail, with > general message parsing using *TEXT for all header values. And for chunk > headers *TEXT minus folding for the general message format, only needing > to dive into quoting etc when eventually processing the chunk extension > values (if at all). > > > Regarding the allowable characters there imho is absolutely no need to > allow for control characters anywhere in HTTP headers or chunk headers, > quoted or not, and it's additionally very very likely many parsers will > fail on such constructs making them quite non-interoperable. Agreed. > And additionally if restricting the allowed set of quoted characters to > exclude \x00, NL and CR as already done in HTTPbis then it becomes very > questionable from a technical point of view (ignoring parsing) to allow > the use of other CTLs in quoted form. The use of having CTLs in header > values is very limited to begin with, basically only needed to support > transmission of (non-UTF8) multibyte charactersets or binary non-text > data, in which case having those three excluded is already a signifcant > issue for such use. Yes. > So imho quoted-pair should be > > quoted-text = %x09 / %x20-%x7E / obs-text > ; WSP / VCHAR / obs-text > quoted-pair = "\" qchar > > to match the use of *TEXT in 2616, making comments and quoted strings > all fit within *TEXT as those constructs is only used in detailed forms > which should be a subset of the more generic *TEXT. "qchar" being...? > This reasoning is also consistent with the current field-content > definition using VTEXT etc.. > > field-value = *( field-content / OWS ) > field-content = *( WSP / VCHAR / obs-text ) > > This field-content definition DOES NOT allow for CTLs other than HT. > Allowing quoted-pair to include CTLs other than HT is incompatible with > the above (from latest p1) definition of field-content. > > If you look closely you'll notice the quoted-text and field-contents > definitions above are equal. Perhaps a common term should be defined for > that similar to the *TEXT element used in 2616. There is probably more > places where using said term would make sense. And sorry, no I do not > have a good suggested BNF name for this construct.. TEXT would be > confusing with 2616 and text in lower case too generic to be used in > describing text. general-text? > ... "characters"? Anyway, my take away from your analysis is: "yes, CTLs need to be disallowed both in comments and quoted-text", right? BR, julian
Received on Tuesday, 25 August 2009 12:47:56 UTC