- From: Terje Bless <link@pobox.com>
- Date: Mon, 18 Nov 2002 11:14:37 +0100
- To: "Roy T. Fielding" <fielding@apache.org>
- cc: URI List <uri@w3.org>
Roy T. Fielding <fielding@apache.org> wrote: >The specification requires things that are necessary for >interoperability, and suggests things that are desirable for good >practice/robustness. Yes, I agree that this is the role of a Standards Track RFC. But I was not able to understand what requirements this RFC imposes or what suggestions it makes on the matter of when to escape reserved characters. >None of the issues you mentioned are necessary for >interoperability, and in fact you will find that implementations >don't care one way or the other. That is why "?" in query is >reserved instead of disallowed, since it is more robust for parsers >to expect it to be present (and treat it as data) rather than not >expect it and think of it as an error. The "Be liberal in what you accept." principle restated? >URI generators, OTOH, are encouraged to use such characters only >when they are used according to a reserved purpose, which in the >case of "?" inside a query means never. And instead they are encouraged, but not required, to escape the characters; correct? >That is not a contradiction, and certainly isn't ambiguous, since >the specification both defines parsing and makes recommendations to >URI generators. It leaves the matter ambigious, subjectively, in the sense that I have been unable, after reading the document, to determine the (presumably simple) case of a "?" appearing inside the query component. Perhaps "Does not explain it in a manner I was able to understand" is a better phrasing (but it doesn't sound so catchy as a message subject ;D). RFC1738 contains the text: If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. This is very clear and leaves no doubt about what is the expected behaviour of an URI generator. A similar statement does not appear in RFC2396, and neither do any statements making recommendation about "best practice" in this area. I take your statement on Issue #13[0] -- "That is why they are listed as reserved in that context (i.e., should not be used unencoded except when the reserved meaning is intended)." -- to mean that characters marked as "reserved" in any particular context are intended to be escaped/encoded regardless of whether they would "conflict" with it's reserved purpose. Is that a correct assumption? Can I generalize from this and assume, that if I were the agent responsible for generating the URL http://example.com/redir?uri=http://example.com/elsewhere/ (as Jim gave as example), I would be expected to encode/escape it as http://example.com/redir?uri=http%3A%2F%2Fexample.com%2Felsewhere%2F even if URI parsers are expected to accept the former as well as the latter? And that a similar generalization will apply to an example such as "http://example.com/cgi.pl?uri=http://example.com/cgi-2.pl?para=meter" such that the second "?" and "=", and all ":" and "/" characters following the left-most "?" are expected/recommended to be encoded by URI generators? Would it be correct to say that the characters found in the "unwise" and "delims" productions MUST always be encoded, and any character in the "reserved" set in any given context SHOULD be encoded within that context? Would it be correct to say that any character in the "reserved" production -- which differs from the set labelled as "reserved" for any particular component -- SHOULD always be encoded unless used for it's reserved purpose? One thing is that I do not understand this, but I had the help of several very smart people in attempting to determine what RFC2396 is actually saying. If an update to RFC2396 is in progress, would it be possible to take this into account and perhaps spell out requirements and recommendations for various agents (generators vs. parsers) in greater detail? As an example, if my speculation above is correct, it might say something along the lines of (paraphrasing RFC1738 and using RFC2119 conventions): If the character is reserved in a component, the character SHOULD be encoded by URI generators, but MUST be accepted unencoded by URI parsers. Or similar language designed to make abundantly clear what the intended behaviour of various agents are in the different contexts. -- Interviewer: "In what language do you write your algorithms?" Abigail: English. Interviewer: "What would you do if, say, Telnet didn't work?" Abigail: Look at the error message.
Received on Monday, 18 November 2002 05:14:52 UTC