- From: Joe Gregorio <joe@bitworking.org>
- Date: Fri, 12 Oct 2007 22:22:51 -0400
- To: uri@w3.org
Warning: LONG, but by the end I get to real life code, so stick with me.
That last draft for URI Templates was published back in July
and to be honest I haven't been very happy with it. We've struggled with
percent-encoding of reserved characters for a very long time
and latest draft doesn't really resolve the problem.
During substitution, the string value of a template variable MUST
have any characters that do not match the reserved or unreserved
rules (i.e., those characters not legal in URIs without percent
encoding) percent-encoded, as per [RFC3986], section 2.1. Specific
applications of URI Templates MAY specify additional constraints and
encoding rules in addition to this.
This is unsatisfactory for a lot of reasons, mostly related to how functional
the spec actually is. There is a large set of cases I think URI Templates can
be used for and I don't think the very simple templating mechanism defined
covers nearly enough cases. I also think that the story around
percent-encoding is hopelessly mired. For example, here are some examples
that I hope show that the current {var} system is inadequate:
Here is a URI Template for my own blog:
URI Template
http://bitworking.org/news/{entry}
Template Variable(s)
entry := 'RESTLog_Specification'
URI
http://bitworking.org/news/RESTLog_Specification
The problem is that a year ago I changed the URI structure for new
posts going forward, while old posts keep the same structure. Here
is an example from a newer entry:
URI Template
http://bitworking.org/news/{entry}
Template Variable(s)
entry := '240/Newsqueak'
URI
http://bitworking.org/news/240/Newsqueak
On the other hand, if I wanted to search for this post on
Technorati I would want:
URI Template
http://technorati.com/search/{term}
Template Variable(s)
term := '240/Newsqueak'
URI
http://technorati.com/search/240%2FNewsqueak
Right off the bat we have a problem with percent-encoding
and reserved characters. Here are some more examples
with '&' and '=' characters:
URI Template
http://www.google.com/search?q={term}
Template Variable(s)
term := ben&jerrys
URI
http://www.google.com/search?q=ben&jerrys
Failing to percent-encode will get you the wrong results.
URI Template
http://www.google.com/search?q={term}
Template Variable(s)
term := 2+2=5
URI
http://www.google.com/search?q=2%2B2%3D5
URI Template
http://www.google.com/base/feeds/snippets/?{author}
Template Variables
author := author=joe.gregorio@gmail.com
URI
http://www.google.com/base/feeds/snippets/?author%3djoe.gregorio%40gmail.com
From the above you can see how not percent-encoding or over
percent-encoding can change the meaning of a URI. I'm sure we could
all construct pathological cases where any reserved character
should and should not be percent-encoded.
Not only is the percent-encoding not a solved problem, there
are moderately complex cases that we can't approach at all.
For example, in this fairly common search:
http://www.google.com/search?q={term}&num={n}
the query parameter num is optional, how do we show that?
In addition, what about this combination:
http://www.google.com/search?q={term}
term := Îñţérñåţîöñåļîžåţîöñ
What character encoding do we use?
And here is an even more complex example from
GData. Paths of some URIs are actually boolean
logic filters on the categories of elements
returned in the feed:
URI Template
/-/{category}/
Category Rules
OR fred|barney
AND fred/barney
NOT -fred
Example
/-/A%7C-B/-C means (A OR (NOT B)) AND (NOT C)
So I hope I've given enough examples to show that the
cause is completely hopeless at least for the simple
{var} expansion.
Roy suggested a bash-inspired substitution rule set:
{=default:variable}
If variable is defined and non-empty, then substitute the
value of variable. Otherwise, substitute with the default value:
E.g., {=red:favoritecolor} = "value" or "red"
{?prefix:variable}
If variable is defined and non-empty, then substitute the
string of non-colon characters between the '?' and ':', if
any, followed by the value of variable. Otherwise,
substitute with the empty string.
E.g.,
{?/:variable} = "/value" or ""
{?;name=:variable} = ";name=value" or ""
{?#:variable} = "#value" or ""
Now notice how this expansion adds reserved characters
into the URI. So I took Roy's expansions and tried an experiment:
What if the *only* way to get a reserved character into
the final URI was through templating expansions?
This points out the crux of the problem and the
solution. Einstein said "Make everything as simple as possible, but
not simpler",
and the current templating spec is a case of trying to make something
too simple.
The percent-encoding is a complexity, and like a lump in
the rug, you push it down in one place and it will pop up
in another. The simple {var} syntax is too simple to handle
what we are trying to do. So let's expand the power
of these expansions and see if that makes the problems go away.
So what does this experiment look like? Let's be as draconian as possible:
1. Convert Unicode template values to UTF-8.
2. Percent-encode all characters outside unreserved.
We can keep our simple {var} expansion, but let's add in
a default value:
{var=default}
Simple substitution
Example:
URI Template
http://example.org/{fruit=orange}/
Template Var
fruit = "apple"
URI
http://example.org/apple
URI Template
http://example.org/{fruit=orange}/
Template Var
fruit is undefined
URI
http://example.org/orange
The rest of the expansions I'll define follow this form:
{<op><arg>|<variable(s)>}
<op> - A single character not in unreserved.
<arg> - Any legal URI character.
<variables> - May be more than one, may have defaults.
Default values must already be percent-encoded.
Variable names are from unreserved.
{<prefix|var[=default]}
Prefix var with prefix, emit empty string if
var is empty or undefined.
URI Template
bar{</|var}/
Template Var
var := foo
URI
bar/foo/
{<postfix|var[=default]}
Append var with postfix, emit empty string if
var is empty or undefined.
URI Template
bar/{>#home|var}
Template Var
var := foo
URI
bar/foo#home
{,sep|var1=def1, var2=def2, ...}
Substitute the concatenation of variable name,
"=", variable value. Join more than one var by the value
of 'sep'.
URI Template
{,&|name,location,age}
Template Var
name := joe
location := NYC
URI
name=joe&location=NYC
{&sep|var}
Treat var as a list and join the values in the list
with the given separator. Emit empty string if var
is empty or undefined
URI Template
{&/|segments}
Template Var
segments := ["a", "b", "c"]
URI
a/b/c
{?opt|var}
Inserts opt if var is a string or non-zero length list.
URI Template
{?/|segments}
Template Var
segments := ["a", "b", "c"]
URI
/
{!opt|var}
Inserts opt if var is undefined or a zero length list.
URI Template
{!/|segments}
Template Var
segments := ["a", "b", "c"]
URI
""
Does it work? Let's try all our previous examples:
My blog template works out fine:
http://bitworking.org/news/{entry}
entry := '240/Newsqueak'
http://bitworking.org/news/240/Newsqueak
All the searches do too:
http://www.google.com/search?q={term}
term := ben&jerrys
http://www.google.com/search?q=ben%26jerrys
http://www.google.com/search?q={term}
term := 2+2=5
http://www.google.com/search?q=2%2B2%3D5
http://www.google.com/base/feeds/snippets/?{,&|author}
author := joe.gregorio@gmail.com
http://www.google.com/base/feeds/snippets/?author%3djoe.gregorio%40gmail.com
In the example from Google search, all variable names in the {,}
expansion are optional, i.e.
none of those variables need be defined.
http://www.google.com/search?q={,&|term,num}
Internationaliztion is also covered:
http://www.google.com/search?q={term}
term := Îñţérñåţîöñåļîžåţîöñ
http://www.google.com/search?q=%C3%8E%C3%B1%C5%A3%C3%A9r%C3%B1%C3%A5%C5%A3%C3%AE%C3%B6%C3%B1%C3%A5%C4%BC%C3%AE%C5%BE%C3%A5%C5%A3%C3%AE%C3%B6%C3%B1
Note that to handle the complex category query from GData we
need to use two expansions: '?' and '|'.
URI Template
{?/-/:categories}{|/:categories}
Template Vars
categories = ["A|-B", "-C"]
URI
/-/A%7C-B/-C
To see multiple expansions working together at
the same time let's look at a URI from the Google
Notebook GData service<http://code.google.com/apis/notebook/reference.html>:
http://www.google.com/notebook/feeds/
{userID}{</notebooks/|notebookID}
{?/-/|categories}{&/|categories}?
{,&:updated-min,updated-max,alt,
start-index,max-results,entryID,orderby}
So this system is pretty capable without crossing over into
the realm of Turing Complete. On the other hand, it is not
without fault:
1. Doesn't handle repeated query parameters.
2. Doesn't specify if variables are mandatory or optional.
3. Doesn't handle encodings besides UTF-8.
4. Template language is complex, cryptic.
5. No handling of input validation, enums, ranges, etc.
6. Possible to define a self-inconsistent URI Template:
1. {&|fred}{<#|fred}
7. Prefixes and suffixes are redundant, as
they could be handled by using the '?' expansion.
8. Comma expansions could have two strings, one to separate
name-value pairs (as now), the other to separate names from
values (now hard-coded to "=").
9. Sensible defaults need to be invented to deal with parameter values
that are lists when not expected to be (or are not lists when
expected to be) (see #6).
10. No specification for how to handle IRIs beyond "Turn an IRI Template
into a URI Template and then proceed."
11. Need way to say "Insert this if some/none of these variables exist"
to strip trailing "?" from URIs with no parameters.
To those of you thinking to yourself, "I'd understand that much
better as a BNF", here you go:
token arg '[^\|]*';
token varname '[\w\-\.~]+' ;
token vardefault '[\w\-\.\~\%]+' ;
START -> Template;
Template ->
IdentityOperatorTemplate
| OperatorTemplate
;
IdentityOperatorTemplate/ -> Var
OperatorTemplate/o ->
'>' arg '\|' Var
| '<' arg '\|' Var
| ',' arg '\|' Vars
| '\&' arg '\|' VarNoDefault
| '\?' arg '\|' VarNoDefault
| '!' arg '\|' VarNoDefault
;
Vars -> Var ( ',' Var ) * ;
Var -> varname ( '=' vardefault ) ? ;
VarNoDefault -> varname
And finally to those of you thinking to yourself, "that would be so
much better as working code", I present:
http://code.google.com/p/uri-templates/
A Python implementation, with unit tests, requires 'tpg', the Toy
Parser Generator.
>>> import template_parser
>>> t = template_parser.URITemplate("http://www.google.com/notebook/feeds/{userID}{</notebooks/|notebookID}{?/-/|categories}{&/|categories}?{,&|updated-min,updated-max,alt,start-index,max-results,entryID,orderby}")
>>> t.sub({})
'http://www.google.com/notebook/feeds/?'
>>> t.sub({"userID": "joe.gregorio"})
'http://www.google.com/notebook/feeds/joe.gregorio?'
>>> t.sub({"userID": "joe.gregorio", "notebookID": "foo"})
'http://www.google.com/notebook/feeds/joe.gregorio/notebooks/foo?'
>>> t.sub({"userID": "joe.gregorio", "notebookID": "foo", "start-index": "20"})
'http://www.google.com/notebook/feeds/joe.gregorio/notebooks/foo?start-index=20'
>>>
So it's clear, I don't believe this is a final or complete solution, but I think
it's a good start and at least proves that expansions are a viable
solution to the percent-encoding issue.
Thanks,
-joe
--
Joe Gregorio http://bitworking.org
Received on Saturday, 13 October 2007 02:23:01 UTC