- From: Jonathan Marsh <jonathan@wso2.com>
- Date: Thu, 22 Feb 2007 16:33:21 -0800
- To: "'Jonathan Marsh'" <jonathan@wso2.com>, "'Youenn Fablet'" <youenn.fablet@crf.canon.fr>, "'keith chapman'" <keithgchapman@gmail.com>
- Cc: "'www-ws-desc'" <www-ws-desc@w3.org>
Summary: - Add "&" to the pre-? encoding rule exclusion set. - There are lots of esoteric ways to abuse templates to create malformed URIs. I think we should avoid that slippery slope. Analysis: Looking again at RFC 3986 [1], a path segment is defined as: segment = *pchar segment-nz = 1*pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" pchar = unreserved / pct-encoded / sub-delims / ":" / "@" pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" That differs from the spec that we have only in that "&" is missing in the spec. I think this is an omission, and that "&" should therefore be added into the pre-"?" encoding list. That takes care of making sure any character disallowed in a path by the above BNF is properly escaped. Certain forms (path-noscheme) restrict a colon, but I don't believe that generates an error, just changes the form. There are other possibilities for templates than just path segments though: - If one were to use it for the scheme one would have to be careful not to have characters other than ALPHA / DIGIT / "+" / "-" / "." appear in the XML data, or the scheme could be malformed. Note that any character that resulted in %-encoding would be problematic - as %-encoding doesn't seem to be allowed in the scheme production either! - If one were to use it for the authority one would be unable to specify userinfo, which disallows "@" in order to disambiguate the @ separator between the authority and the host. - If one were to use it for the port one would be restricted to digits only (again no %-escaping accommodation). - Edge cases all the way down here. I am inclined to ignore this - if you're doing fine-grained templating of parts prior to the path for some reason you just have to be careful. I don't think it's practical to try and flag every potential usage that can result in a mal-formed URI. The other half of the question then is whether any of the allowed characters should be escaped even though they don't interfere with the well-formedness of the path segment. The most relevant text is the last paragraph of section 3.3: Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm. It is indeed true that a template like "name;v={version}" where version contained ";" or "=" could be difficult to work with. But since a path segment is "considered opaque" by the generic syntax this level of checking seems overkill. And to the extent we restrict it we'd simply force people to turn to raw mode to do things like "{segment}" where segment is "name;v=1.1". I don't therefore see compelling advantage in restricting characters that don't break the URI syntax. [1] http://www.ietf.org/rfc/rfc3986.txt Jonathan Marsh - http://www.wso2.com - http://auburnmarshes.spaces.live.com > -----Original Message----- > From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On > Behalf Of Jonathan Marsh > Sent: Thursday, February 22, 2007 2:01 PM > To: 'Youenn Fablet'; 'keith chapman' > Cc: 'www-ws-desc' > Subject: RE: LocationTemplate-1G test > > > Summarizing this thread, this morning's discussion, and the related > issues: > > - [FIXED] * was improperly encoded in the baseline. > > - [QUESTION 1] The spec says what characters MUST be encoded, but there > are > also characters that MAY be encoded such as * (and pretty much any other > character except %). Our test suite assumes only the characters that MUST > be are. Should we change this? (I think we should do this > opportunistically, that is, if a testcase is proven to be correct, we > simply > add an alternative that matches that implementation's encoding strategy. > I > don't think we have any failures because of this at present.) > > - [AGREED] Per the last paragraph of 6.8.1, referencing section 3.1 of RFC > 3987, some further encoding is performed after the http location templates > are resolved and combined with the {address} property. > > - [QUESTION 2] Is this sufficiently clear in the spec? (I think so.) > > - [AGREED] Besides the extended characters encoded above, the spec says > implementations SHOULD also encode "<", ">", '"', space, "{", "}", "|", > "\", > "^", and "`". Our test suite will currently assume this SHOULD has been > followed. > > - [FIXED] There other editorial improvements such as removing the double > negative, reordering bullets, removing query parameter separator from > consideration before the "?". > > - [QUESTION 3] Are there additional editorial improvements possible? (I > think so, as reported in > http://lists.w3.org/Archives/Public/www-ws-desc/2007Feb/0193.html). > > - [QUESTION 4] Is "&" a harmful character before the "?". If not, we > should > add it to the excluded list. > > - [QUESTION 5] Are ";" and "=" harmful characters before the "?". If so, > we > should remove them from the excluded list. > > I'll research proposals for 4 and 5 per my AI, but if there are any other > questions I didn't capture here, let us know! > > Jonathan Marsh - http://www.wso2.com - > http://auburnmarshes.spaces.live.com > >
Received on Friday, 23 February 2007 00:33:25 UTC