- From: Jonathan Marsh <jonathan@wso2.com>
- Date: Thu, 22 Feb 2007 16:44:03 -0800
- To: "'Jonathan Marsh'" <jonathan@wso2.com>, "'Youenn Fablet'" <youenn.fablet@crf.canon.fr>, "'keith chapman'" <keithgchapman@gmail.com>
- Cc: "'www-ws-desc'" <www-ws-desc@w3.org>
BTW, this includes an answer to QUESTION 4, and is recorded as issue CR157
[1].
[1] http://www.w3.org/2002/ws/desc/5/cr-issues/#CR157.
Jonathan Marsh - http://www.wso2.com - http://auburnmarshes.spaces.live.com
> -----Original Message-----
> From: Jonathan Marsh [mailto:jonathan@wso2.com]
> Sent: Thursday, February 22, 2007 4:33 PM
> To: 'Jonathan Marsh'; 'Youenn Fablet'; 'keith chapman'
> Cc: 'www-ws-desc'
> Subject: [QUESTION 5] Are ";" and "=" harmful characters before the "?"
> (was: RE: LocationTemplate-1G test)
>
> Summary:
> - Add "&" to the pre-? encoding rule exclusion set.
> - There are lots of esoteric ways to abuse templates to create
> malformed URIs. I think we should avoid that slippery slope.
>
> Analysis:
>
> Looking again at RFC 3986 [1], a path segment is defined as:
>
> segment = *pchar
> segment-nz = 1*pchar
> segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
> ; non-zero-length segment without any colon ":"
>
> pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
>
>
> pct-encoded = "%" HEXDIG HEXDIG
>
> unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
>
> sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
> / "*" / "+" / "," / ";" / "="
>
>
> That differs from the spec that we have only in that "&" is missing in the
> spec. I think this is an omission, and that "&" should therefore be added
> into the pre-"?" encoding list. That takes care of making sure any
> character disallowed in a path by the above BNF is properly escaped.
> Certain forms (path-noscheme) restrict a colon, but I don't believe that
> generates an error, just changes the form.
>
> There are other possibilities for templates than just path segments
> though:
>
>
> - If one were to use it for the scheme one would have to be careful not to
> have characters other than ALPHA / DIGIT / "+" / "-" / "." appear in the
> XML
> data, or the scheme could be malformed. Note that any character that
> resulted in %-encoding would be problematic - as %-encoding doesn't seem
> to
> be allowed in the scheme production either!
>
> - If one were to use it for the authority one would be unable to specify
> userinfo, which disallows "@" in order to disambiguate the @ separator
> between the authority and the host.
>
> - If one were to use it for the port one would be restricted to digits
> only
> (again no %-escaping accommodation).
>
> - Edge cases all the way down here.
>
> I am inclined to ignore this - if you're doing fine-grained templating of
> parts prior to the path for some reason you just have to be careful. I
> don't think it's practical to try and flag every potential usage that can
> result in a mal-formed URI.
>
>
> The other half of the question then is whether any of the allowed
> characters
> should be escaped even though they don't interfere with the well-
> formedness
> of the path segment.
>
> The most relevant text is the last paragraph of section 3.3:
>
> Aside from dot-segments in hierarchical paths, a path segment is
> considered opaque by the generic syntax. URI producing applications
> often use the reserved characters allowed in a segment to delimit
> scheme-specific or dereference-handler-specific subcomponents. For
> example, the semicolon (";") and equals ("=") reserved characters are
> often used to delimit parameters and parameter values applicable to
> that segment. The comma (",") reserved character is often used for
> similar purposes. For example, one URI producer might use a segment
> such as "name;v=1.1" to indicate a reference to version 1.1 of
> "name", whereas another might use a segment such as "name,1.1" to
> indicate the same. Parameter types may be defined by scheme-specific
> semantics, but in most cases the syntax of a parameter is specific to
> the implementation of the URI's dereferencing algorithm.
>
> It is indeed true that a template like "name;v={version}" where version
> contained ";" or "=" could be difficult to work with. But since a path
> segment is "considered opaque" by the generic syntax this level of
> checking
> seems overkill. And to the extent we restrict it we'd simply force people
> to turn to raw mode to do things like "{segment}" where segment is
> "name;v=1.1".
>
> I don't therefore see compelling advantage in restricting characters that
> don't break the URI syntax.
>
> [1] http://www.ietf.org/rfc/rfc3986.txt
>
>
> Jonathan Marsh - http://www.wso2.com -
> http://auburnmarshes.spaces.live.com
>
>
> > -----Original Message-----
> > From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On
> > Behalf Of Jonathan Marsh
> > Sent: Thursday, February 22, 2007 2:01 PM
> > To: 'Youenn Fablet'; 'keith chapman'
> > Cc: 'www-ws-desc'
> > Subject: RE: LocationTemplate-1G test
> >
> >
> > Summarizing this thread, this morning's discussion, and the related
> > issues:
> >
> > - [FIXED] * was improperly encoded in the baseline.
> >
> > - [QUESTION 1] The spec says what characters MUST be encoded, but there
> > are
> > also characters that MAY be encoded such as * (and pretty much any other
> > character except %). Our test suite assumes only the characters that
> MUST
> > be are. Should we change this? (I think we should do this
> > opportunistically, that is, if a testcase is proven to be correct, we
> > simply
> > add an alternative that matches that implementation's encoding strategy.
> > I
> > don't think we have any failures because of this at present.)
> >
> > - [AGREED] Per the last paragraph of 6.8.1, referencing section 3.1 of
> RFC
> > 3987, some further encoding is performed after the http location
> templates
> > are resolved and combined with the {address} property.
> >
> > - [QUESTION 2] Is this sufficiently clear in the spec? (I think so.)
> >
> > - [AGREED] Besides the extended characters encoded above, the spec says
> > implementations SHOULD also encode "<", ">", '"', space, "{", "}", "|",
> > "\",
> > "^", and "`". Our test suite will currently assume this SHOULD has been
> > followed.
> >
> > - [FIXED] There other editorial improvements such as removing the double
> > negative, reordering bullets, removing query parameter separator from
> > consideration before the "?".
> >
> > - [QUESTION 3] Are there additional editorial improvements possible? (I
> > think so, as reported in
> > http://lists.w3.org/Archives/Public/www-ws-desc/2007Feb/0193.html).
> >
> > - [QUESTION 4] Is "&" a harmful character before the "?". If not, we
> > should
> > add it to the excluded list.
> >
> > - [QUESTION 5] Are ";" and "=" harmful characters before the "?". If
> so,
> > we
> > should remove them from the excluded list.
> >
> > I'll research proposals for 4 and 5 per my AI, but if there are any
> other
> > questions I didn't capture here, let us know!
> >
> > Jonathan Marsh - http://www.wso2.com -
> > http://auburnmarshes.spaces.live.com
> >
> >
Received on Friday, 23 February 2007 00:44:01 UTC