- From: Joe Gregorio <joe@bitworking.org>
- Date: Fri, 12 Oct 2007 22:22:51 -0400
- To: uri@w3.org
Warning: LONG, but by the end I get to real life code, so stick with me. That last draft for URI Templates was published back in July and to be honest I haven't been very happy with it. We've struggled with percent-encoding of reserved characters for a very long time and latest draft doesn't really resolve the problem. During substitution, the string value of a template variable MUST have any characters that do not match the reserved or unreserved rules (i.e., those characters not legal in URIs without percent encoding) percent-encoded, as per [RFC3986], section 2.1. Specific applications of URI Templates MAY specify additional constraints and encoding rules in addition to this. This is unsatisfactory for a lot of reasons, mostly related to how functional the spec actually is. There is a large set of cases I think URI Templates can be used for and I don't think the very simple templating mechanism defined covers nearly enough cases. I also think that the story around percent-encoding is hopelessly mired. For example, here are some examples that I hope show that the current {var} system is inadequate: Here is a URI Template for my own blog: URI Template http://bitworking.org/news/{entry} Template Variable(s) entry := 'RESTLog_Specification' URI http://bitworking.org/news/RESTLog_Specification The problem is that a year ago I changed the URI structure for new posts going forward, while old posts keep the same structure. Here is an example from a newer entry: URI Template http://bitworking.org/news/{entry} Template Variable(s) entry := '240/Newsqueak' URI http://bitworking.org/news/240/Newsqueak On the other hand, if I wanted to search for this post on Technorati I would want: URI Template http://technorati.com/search/{term} Template Variable(s) term := '240/Newsqueak' URI http://technorati.com/search/240%2FNewsqueak Right off the bat we have a problem with percent-encoding and reserved characters. Here are some more examples with '&' and '=' characters: URI Template http://www.google.com/search?q={term} Template Variable(s) term := ben&jerrys URI http://www.google.com/search?q=ben&jerrys Failing to percent-encode will get you the wrong results. URI Template http://www.google.com/search?q={term} Template Variable(s) term := 2+2=5 URI http://www.google.com/search?q=2%2B2%3D5 URI Template http://www.google.com/base/feeds/snippets/?{author} Template Variables author := author=joe.gregorio@gmail.com URI http://www.google.com/base/feeds/snippets/?author%3djoe.gregorio%40gmail.com From the above you can see how not percent-encoding or over percent-encoding can change the meaning of a URI. I'm sure we could all construct pathological cases where any reserved character should and should not be percent-encoded. Not only is the percent-encoding not a solved problem, there are moderately complex cases that we can't approach at all. For example, in this fairly common search: http://www.google.com/search?q={term}&num={n} the query parameter num is optional, how do we show that? In addition, what about this combination: http://www.google.com/search?q={term} term := Îñţérñåţîöñåļîžåţîöñ What character encoding do we use? And here is an even more complex example from GData. Paths of some URIs are actually boolean logic filters on the categories of elements returned in the feed: URI Template /-/{category}/ Category Rules OR fred|barney AND fred/barney NOT -fred Example /-/A%7C-B/-C means (A OR (NOT B)) AND (NOT C) So I hope I've given enough examples to show that the cause is completely hopeless at least for the simple {var} expansion. Roy suggested a bash-inspired substitution rule set: {=default:variable} If variable is defined and non-empty, then substitute the value of variable. Otherwise, substitute with the default value: E.g., {=red:favoritecolor} = "value" or "red" {?prefix:variable} If variable is defined and non-empty, then substitute the string of non-colon characters between the '?' and ':', if any, followed by the value of variable. Otherwise, substitute with the empty string. E.g., {?/:variable} = "/value" or "" {?;name=:variable} = ";name=value" or "" {?#:variable} = "#value" or "" Now notice how this expansion adds reserved characters into the URI. So I took Roy's expansions and tried an experiment: What if the *only* way to get a reserved character into the final URI was through templating expansions? This points out the crux of the problem and the solution. Einstein said "Make everything as simple as possible, but not simpler", and the current templating spec is a case of trying to make something too simple. The percent-encoding is a complexity, and like a lump in the rug, you push it down in one place and it will pop up in another. The simple {var} syntax is too simple to handle what we are trying to do. So let's expand the power of these expansions and see if that makes the problems go away. So what does this experiment look like? Let's be as draconian as possible: 1. Convert Unicode template values to UTF-8. 2. Percent-encode all characters outside unreserved. We can keep our simple {var} expansion, but let's add in a default value: {var=default} Simple substitution Example: URI Template http://example.org/{fruit=orange}/ Template Var fruit = "apple" URI http://example.org/apple URI Template http://example.org/{fruit=orange}/ Template Var fruit is undefined URI http://example.org/orange The rest of the expansions I'll define follow this form: {<op><arg>|<variable(s)>} <op> - A single character not in unreserved. <arg> - Any legal URI character. <variables> - May be more than one, may have defaults. Default values must already be percent-encoded. Variable names are from unreserved. {<prefix|var[=default]} Prefix var with prefix, emit empty string if var is empty or undefined. URI Template bar{</|var}/ Template Var var := foo URI bar/foo/ {<postfix|var[=default]} Append var with postfix, emit empty string if var is empty or undefined. URI Template bar/{>#home|var} Template Var var := foo URI bar/foo#home {,sep|var1=def1, var2=def2, ...} Substitute the concatenation of variable name, "=", variable value. Join more than one var by the value of 'sep'. URI Template {,&|name,location,age} Template Var name := joe location := NYC URI name=joe&location=NYC {&sep|var} Treat var as a list and join the values in the list with the given separator. Emit empty string if var is empty or undefined URI Template {&/|segments} Template Var segments := ["a", "b", "c"] URI a/b/c {?opt|var} Inserts opt if var is a string or non-zero length list. URI Template {?/|segments} Template Var segments := ["a", "b", "c"] URI / {!opt|var} Inserts opt if var is undefined or a zero length list. URI Template {!/|segments} Template Var segments := ["a", "b", "c"] URI "" Does it work? Let's try all our previous examples: My blog template works out fine: http://bitworking.org/news/{entry} entry := '240/Newsqueak' http://bitworking.org/news/240/Newsqueak All the searches do too: http://www.google.com/search?q={term} term := ben&jerrys http://www.google.com/search?q=ben%26jerrys http://www.google.com/search?q={term} term := 2+2=5 http://www.google.com/search?q=2%2B2%3D5 http://www.google.com/base/feeds/snippets/?{,&|author} author := joe.gregorio@gmail.com http://www.google.com/base/feeds/snippets/?author%3djoe.gregorio%40gmail.com In the example from Google search, all variable names in the {,} expansion are optional, i.e. none of those variables need be defined. http://www.google.com/search?q={,&|term,num} Internationaliztion is also covered: http://www.google.com/search?q={term} term := Îñţérñåţîöñåļîžåţîöñ http://www.google.com/search?q=%C3%8E%C3%B1%C5%A3%C3%A9r%C3%B1%C3%A5%C5%A3%C3%AE%C3%B6%C3%B1%C3%A5%C4%BC%C3%AE%C5%BE%C3%A5%C5%A3%C3%AE%C3%B6%C3%B1 Note that to handle the complex category query from GData we need to use two expansions: '?' and '|'. URI Template {?/-/:categories}{|/:categories} Template Vars categories = ["A|-B", "-C"] URI /-/A%7C-B/-C To see multiple expansions working together at the same time let's look at a URI from the Google Notebook GData service<http://code.google.com/apis/notebook/reference.html>: http://www.google.com/notebook/feeds/ {userID}{</notebooks/|notebookID} {?/-/|categories}{&/|categories}? {,&:updated-min,updated-max,alt, start-index,max-results,entryID,orderby} So this system is pretty capable without crossing over into the realm of Turing Complete. On the other hand, it is not without fault: 1. Doesn't handle repeated query parameters. 2. Doesn't specify if variables are mandatory or optional. 3. Doesn't handle encodings besides UTF-8. 4. Template language is complex, cryptic. 5. No handling of input validation, enums, ranges, etc. 6. Possible to define a self-inconsistent URI Template: 1. {&|fred}{<#|fred} 7. Prefixes and suffixes are redundant, as they could be handled by using the '?' expansion. 8. Comma expansions could have two strings, one to separate name-value pairs (as now), the other to separate names from values (now hard-coded to "="). 9. Sensible defaults need to be invented to deal with parameter values that are lists when not expected to be (or are not lists when expected to be) (see #6). 10. No specification for how to handle IRIs beyond "Turn an IRI Template into a URI Template and then proceed." 11. Need way to say "Insert this if some/none of these variables exist" to strip trailing "?" from URIs with no parameters. To those of you thinking to yourself, "I'd understand that much better as a BNF", here you go: token arg '[^\|]*'; token varname '[\w\-\.~]+' ; token vardefault '[\w\-\.\~\%]+' ; START -> Template; Template -> IdentityOperatorTemplate | OperatorTemplate ; IdentityOperatorTemplate/ -> Var OperatorTemplate/o -> '>' arg '\|' Var | '<' arg '\|' Var | ',' arg '\|' Vars | '\&' arg '\|' VarNoDefault | '\?' arg '\|' VarNoDefault | '!' arg '\|' VarNoDefault ; Vars -> Var ( ',' Var ) * ; Var -> varname ( '=' vardefault ) ? ; VarNoDefault -> varname And finally to those of you thinking to yourself, "that would be so much better as working code", I present: http://code.google.com/p/uri-templates/ A Python implementation, with unit tests, requires 'tpg', the Toy Parser Generator. >>> import template_parser >>> t = template_parser.URITemplate("http://www.google.com/notebook/feeds/{userID}{</notebooks/|notebookID}{?/-/|categories}{&/|categories}?{,&|updated-min,updated-max,alt,start-index,max-results,entryID,orderby}") >>> t.sub({}) 'http://www.google.com/notebook/feeds/?' >>> t.sub({"userID": "joe.gregorio"}) 'http://www.google.com/notebook/feeds/joe.gregorio?' >>> t.sub({"userID": "joe.gregorio", "notebookID": "foo"}) 'http://www.google.com/notebook/feeds/joe.gregorio/notebooks/foo?' >>> t.sub({"userID": "joe.gregorio", "notebookID": "foo", "start-index": "20"}) 'http://www.google.com/notebook/feeds/joe.gregorio/notebooks/foo?start-index=20' >>> So it's clear, I don't believe this is a final or complete solution, but I think it's a good start and at least proves that expansions are a viable solution to the percent-encoding issue. Thanks, -joe -- Joe Gregorio http://bitworking.org
Received on Saturday, 13 October 2007 02:23:01 UTC