- From: Benjamin Carlyle <benjamincarlyle@optusnet.com.au>
- Date: Tue, 02 Jan 2007 08:45:36 +1000
- To: uri@w3.org
Sorry about coming into this late, but... Joe Gregorio wrote: > 1. Escape all 'reserved' characters except @, :, and / > across every component, realizing > we may not end up with a valid URI. > 2. Escape all 'reserved' characters except @, and :, > realizing that our 'path' example > will then break since '/' will get escaped. > 3. Escape all 'reserved' characters except @, :, and /, > but only allow template variables in path, query and > fragment components. 4. Require/allow the context to perform any necessary escaping, eg by requiring appropiate javascript functions to have been called on the parameter values 5. Require/allow the template to specify any necessary escaping This specification is at an interesting point in the uri construction chain. Normally a url to be either captured whole, or built up from parts. Whichever part of the uri parameters are inserted into defines the escaping that needs to occur. http://example.com/query?a={b} where b="d&e=f" should escape "&" and "=" if b is going to be used as the value of a. If b is just a regular part of the query component, however, escaping these characters may be inappropriate. For example, http://example.com/query?{b} might be used to substitute a whole query component. http://example.com{b} might be used to substitute a path and query component. The problem of course is that the client does not know the intent of the template producer. It is probably not a good idea for the client to guess, which leaves explicit direction as part of the template or a general rule that covers 80% of useful cases. As mentioned by Joe earlier in the thread, the server could specify the character encoding style using a limited vocabulary. It might otherwise be possible to list either an explicit set of characters to encode or an explicit set of characters that were safe. Something like http://example.com/query?a={b:&=#} might specify that "&" and "=" need escaping in addition to normal escaping for characters in a query +fragment component. http://example.com/query?a={b:query:&=} might make the "this is a substitution for the query component" clearer while still specifying the additional characters. What you would essentially be looking for is a language or a vocabulary to indicate what part of the url is being substituted by this variable in the template. This would be straightforward for components such as "scheme", "path", or "query" and may be able to be implied by context. However, uris may have forms of domain-specific construction that cannot easily be expressed in a singular vocabulary. This would require a mechanism for specifying additional constraints once blanket rules and vocabulary run out. Starting out with a blanket rule and seeing whether problems emerge in practice is probably the best idea. If problems do emerge, however, it may be worth keeping a language for identifying the part of the uri being substituted in the back of your mind. I'm suspicious that for my usage a blanket rule won't cover all of my use cases. Mark Nottingham wrote: > Your proposal puts the encoding information into the variable name. > That's one option, but I'm reluctant to encourage putting this kind > of thing in there, as it encourages URI Templates to become URI > Schemas, and they'll quickly become unreadable. Encoding is by no > means the last thing we'll want to associate with a particular variable. Are you talking about a separate document associated with a template that fills out additional information that might be of use? Jerome Louvel wrote: > 5. Don't escape any character, leaving this task to the > application converting the template to valid URIs. > My preference goes for #5. Leaving all substitution to the client is a tempting alternative. It was at the top of my list on my first pass at this. A client context such as a javascript environment is likely to already have appropriate capabilities to perform any encoding. However the client won't know which part of the url it is filling out. It can't make sound judgements without additional information. If the client knew enough to judge soundly, it wouldn't need a uri template in the first place. > As an aside, it turns out that the regular expression given in > Appendix B of RFC 3986 is completely capable of > parsing up URI Templates, but only if the characters > allowed in template variable names are restricted, and > only if template variables are not allowed to span > components. I'm in two minds about this. It's a potentially useful feature to be able to do generic uri parsing on a template. However, I don't think it is important enough to make sure we keep the feature. It could be used to identify the characters to be escaped if we found a template parameter within a particular component. However this could be done by running the regex on a version of the url which had the identifiers stripped out. In general, I think that parsing will happen on actual urls rather than url templates. As such, it will probably be more useful to allow a fair range of expressiveness in the identifier to allow little javascript invocations and the like to be included. On the other hand, just recognising a URL in context may be sufficient reason to require url-like content in the identifier. For example, if the identifier contained a space character it may be difficult to pick where the template ended. This suggests to me that at least some restrictions should be applied. Benjamin
Received on Tuesday, 2 January 2007 14:42:13 UTC