- From: Jonathan Marsh <jonathan@wso2.com>
- Date: Thu, 22 Feb 2007 16:44:03 -0800
- To: "'Jonathan Marsh'" <jonathan@wso2.com>, "'Youenn Fablet'" <youenn.fablet@crf.canon.fr>, "'keith chapman'" <keithgchapman@gmail.com>
- Cc: "'www-ws-desc'" <www-ws-desc@w3.org>
BTW, this includes an answer to QUESTION 4, and is recorded as issue CR157 [1]. [1] http://www.w3.org/2002/ws/desc/5/cr-issues/#CR157. Jonathan Marsh - http://www.wso2.com - http://auburnmarshes.spaces.live.com > -----Original Message----- > From: Jonathan Marsh [mailto:jonathan@wso2.com] > Sent: Thursday, February 22, 2007 4:33 PM > To: 'Jonathan Marsh'; 'Youenn Fablet'; 'keith chapman' > Cc: 'www-ws-desc' > Subject: [QUESTION 5] Are ";" and "=" harmful characters before the "?" > (was: RE: LocationTemplate-1G test) > > Summary: > - Add "&" to the pre-? encoding rule exclusion set. > - There are lots of esoteric ways to abuse templates to create > malformed URIs. I think we should avoid that slippery slope. > > Analysis: > > Looking again at RFC 3986 [1], a path segment is defined as: > > segment = *pchar > segment-nz = 1*pchar > segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) > ; non-zero-length segment without any colon ":" > > pchar = unreserved / pct-encoded / sub-delims / ":" / "@" > > > pct-encoded = "%" HEXDIG HEXDIG > > unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" > > sub-delims = "!" / "$" / "&" / "'" / "(" / ")" > / "*" / "+" / "," / ";" / "=" > > > That differs from the spec that we have only in that "&" is missing in the > spec. I think this is an omission, and that "&" should therefore be added > into the pre-"?" encoding list. That takes care of making sure any > character disallowed in a path by the above BNF is properly escaped. > Certain forms (path-noscheme) restrict a colon, but I don't believe that > generates an error, just changes the form. > > There are other possibilities for templates than just path segments > though: > > > - If one were to use it for the scheme one would have to be careful not to > have characters other than ALPHA / DIGIT / "+" / "-" / "." appear in the > XML > data, or the scheme could be malformed. Note that any character that > resulted in %-encoding would be problematic - as %-encoding doesn't seem > to > be allowed in the scheme production either! > > - If one were to use it for the authority one would be unable to specify > userinfo, which disallows "@" in order to disambiguate the @ separator > between the authority and the host. > > - If one were to use it for the port one would be restricted to digits > only > (again no %-escaping accommodation). > > - Edge cases all the way down here. > > I am inclined to ignore this - if you're doing fine-grained templating of > parts prior to the path for some reason you just have to be careful. I > don't think it's practical to try and flag every potential usage that can > result in a mal-formed URI. > > > The other half of the question then is whether any of the allowed > characters > should be escaped even though they don't interfere with the well- > formedness > of the path segment. > > The most relevant text is the last paragraph of section 3.3: > > Aside from dot-segments in hierarchical paths, a path segment is > considered opaque by the generic syntax. URI producing applications > often use the reserved characters allowed in a segment to delimit > scheme-specific or dereference-handler-specific subcomponents. For > example, the semicolon (";") and equals ("=") reserved characters are > often used to delimit parameters and parameter values applicable to > that segment. The comma (",") reserved character is often used for > similar purposes. For example, one URI producer might use a segment > such as "name;v=1.1" to indicate a reference to version 1.1 of > "name", whereas another might use a segment such as "name,1.1" to > indicate the same. Parameter types may be defined by scheme-specific > semantics, but in most cases the syntax of a parameter is specific to > the implementation of the URI's dereferencing algorithm. > > It is indeed true that a template like "name;v={version}" where version > contained ";" or "=" could be difficult to work with. But since a path > segment is "considered opaque" by the generic syntax this level of > checking > seems overkill. And to the extent we restrict it we'd simply force people > to turn to raw mode to do things like "{segment}" where segment is > "name;v=1.1". > > I don't therefore see compelling advantage in restricting characters that > don't break the URI syntax. > > [1] http://www.ietf.org/rfc/rfc3986.txt > > > Jonathan Marsh - http://www.wso2.com - > http://auburnmarshes.spaces.live.com > > > > -----Original Message----- > > From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On > > Behalf Of Jonathan Marsh > > Sent: Thursday, February 22, 2007 2:01 PM > > To: 'Youenn Fablet'; 'keith chapman' > > Cc: 'www-ws-desc' > > Subject: RE: LocationTemplate-1G test > > > > > > Summarizing this thread, this morning's discussion, and the related > > issues: > > > > - [FIXED] * was improperly encoded in the baseline. > > > > - [QUESTION 1] The spec says what characters MUST be encoded, but there > > are > > also characters that MAY be encoded such as * (and pretty much any other > > character except %). Our test suite assumes only the characters that > MUST > > be are. Should we change this? (I think we should do this > > opportunistically, that is, if a testcase is proven to be correct, we > > simply > > add an alternative that matches that implementation's encoding strategy. > > I > > don't think we have any failures because of this at present.) > > > > - [AGREED] Per the last paragraph of 6.8.1, referencing section 3.1 of > RFC > > 3987, some further encoding is performed after the http location > templates > > are resolved and combined with the {address} property. > > > > - [QUESTION 2] Is this sufficiently clear in the spec? (I think so.) > > > > - [AGREED] Besides the extended characters encoded above, the spec says > > implementations SHOULD also encode "<", ">", '"', space, "{", "}", "|", > > "\", > > "^", and "`". Our test suite will currently assume this SHOULD has been > > followed. > > > > - [FIXED] There other editorial improvements such as removing the double > > negative, reordering bullets, removing query parameter separator from > > consideration before the "?". > > > > - [QUESTION 3] Are there additional editorial improvements possible? (I > > think so, as reported in > > http://lists.w3.org/Archives/Public/www-ws-desc/2007Feb/0193.html). > > > > - [QUESTION 4] Is "&" a harmful character before the "?". If not, we > > should > > add it to the excluded list. > > > > - [QUESTION 5] Are ";" and "=" harmful characters before the "?". If > so, > > we > > should remove them from the excluded list. > > > > I'll research proposals for 4 and 5 per my AI, but if there are any > other > > questions I didn't capture here, let us know! > > > > Jonathan Marsh - http://www.wso2.com - > > http://auburnmarshes.spaces.live.com > > > >
Received on Friday, 23 February 2007 00:44:01 UTC