- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 14 Mar 2008 13:57:38 +0100
- To: Brian Smith <brian@briansmith.org>
- CC: 'HTTP Working Group' <ietf-http-wg@w3.org>
Brian Smith wrote: >> ??? >> >> <http://greenbytes.de/tech/webdav/rfc2616.html#basic.rules.quo >> ted-string>: >> >> quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) >> qdtext = <any TEXT except <">> > > <any TEXT except <">> is not equivalent to *TEXT. I would argue that the intent of that production is clearly to inherit the rules for TEXT. Funny enough, this issue is one of the remaining blockers for the conversion to ABNF; we really need to clarify TEXT, and all productions based on TEXT. >> I think this is the intent. > > Then you run into the question "How are media-ranges and media-types > compared? Are they to be decoded into Unicode and then compared?" When Yes. > the specification specifies that ETags must match exactly, is the > comparison character-by-character or octet-by-octet? That's really not relevant as long as the producer of ETags always uses the same representation. But I do agree what we probably need to look at each case where quoted-string is used and decide whether it requires I18N or not. >>> Also, the Reason-phrase of the status line is defined as: >>> >>> *<TEXT, excluding CR, LF> >>> >>> But, is the RFC 2047 mechanism allowed in the Reason-phrase? >> I would think so. > > Again, the grammar for reason-phrase is not *TEXT, that is why it isn't > clear But this is indeed one of the cases where I18N makes sense. Any chance that some of the original authors can explain the history here? >>> And, if it is read liberally, then it is >> I disagree. >> >>> allowed in way too many places. And, if it is allowed >>> anywhere, there should be some advice as to what >>> encodings should be supported. >> From the headers above, where do you think it shouldn't be allowed? > > Consider: > > Content-Type: text/plain;charset="=?utf-8?q?utf-8?=" > (how do you compare this against 'text/plain;charset="utf-8"'?) I would have hoped that RFC2045 answers this, but that doesn't seem to include a definition of quoted-string. > ETag: "=?utf-8?q?asdf?=" > (how do you compare this against "asdf"?) > > ETag: "=?" > (Is this a lexical error?) For ETag, I'd say it's not a problem. If the server producing the ETags wants to cause problems, let it do so. >> I do agree that if we rely on RFC2047, we may also have to >> spend some time improving that document. > > Keep in mind that RFC2047 has a limit of 75 characters per encoded-word. > And, the grammar seems to allow encoded-words to be mixed with unencoded > words. And, Base-64 encoding to be muxed with quotable-printable. And, > multiple encodings (e.g. UTF-8 and UTF-7) to be mixed. All in the same > *TEXT segment. There is definitely a lot to be improved, but each > improvement would be a incompatible change. It seems the only way to improve RFC-2047 would be by introducing a new encoding that is sane. Such as: "Any octet sequence starting with EF BB BF (the UTF-8 BOM) is to be interpreted as Unicode, encoded in UTF-8." BR, Julian
Received on Friday, 14 March 2008 12:58:22 UTC