W3C home > Mailing lists > Public > uri@w3.org > September 2008

Re: URI Templates: done or dead?

From: Roy T. Fielding <fielding@gbiv.com>
Date: Mon, 15 Sep 2008 19:28:57 -0700
Message-Id: <07109D44-233D-42F3-ACB0-56B4A6562903@gbiv.com>
Cc: URI <uri@w3.org>, Joe Gregorio <joe@bitworking.org>, David Orchard <orchard@pacificspirit.com>, Marc Hadley <Marc.Hadley@Sun.COM>
To: Mark Nottingham <mnot@mnot.net>

On Sep 15, 2008, at 4:57 AM, Mark Nottingham wrote:
> There hasn't been a lot of discussion or activity on URI Templates  
> recently, which either means it's very stable, or very nearly dead.

We should just remind the authors that they have several outstanding
comments on the spec and see if they are still interested in editing.

> If it's very stable, we should ship it and be done with it. If it's  
> nearly dead (and I do get a whiff of that; while I continuously  
> hear people clamouring for it to be finished, not many seem to be  
> willing to use it in its current state; YMMV), we should at least  
> try to revive it.

I won't use it in its current state because it isn't finished yet.
The prose is, at best, an outline.  The operators aren't even defined
in words -- the reader has to guess why they exist.  The examples seem
to be obsessed with the most irrelevant corner cases instead of teaching
the common cases first.  And it is far too focused on python language
as a means of definition.  None of these are technical issues.
I am not griping about the lack of completion because I have a similar
list of issues with the HTTPbis spec that I haven't done yet either.

Technically, the mechanism is caught halfway between being concise and
being human friendly, which means it is currently neither.  My opinion
is that we have IETF specs for the purpose of defining interoperable
protocols, not to define user interfaces, and so the argument that these
things should be end-user readable is unfounded and potentially very  
costly.
They only need to be readable by the folks who are defining  
applications.

> My continuing concerns with the -03 draft are that it's too  
> complex, not human-friendly, and it makes the common, simple use  
> cases hard. The first example in the spec ( http://www.example.com/ 
> users/{userid} ) holds up well, but it goes quickly downhill from  
> there; ( http://www.example.com/?{-join|&|query,number} ) looks  
> like line noise, IMHO.

Then please let's drop the idea of using english words as function
names and go back to the use cases that really matter.  I need a
way to describe substitutions for variable values, value prefixes,
URI inserts, ordered value lists, and unordered substitutions
within path segments (path ;param=value) and queries (form &var=value).
Only one of those (URI inserts) needs a raw substitution.

I could use some other tricks as well, but the above is what I know
is needed.  Joe had a lot more use cases that I have probably forgotten.

> I believe there are a few things we can do to make URI Template  
> more broadly useful and useable, without sacrificing too much  
> functionality (at least in the 80% case).
>
> 1. Reduce or drop operators.
>
> As mentioned above, they don't read well; they're obviously  
> intended for machines, not people. The expansion for a template  
> should be blindingly obvious, but the operator syntax seems to want  
> to get in the way rather than help. Furthermore, the vast majority  
> of use cases for templates are for simple template substitution,  
> not operations like 'neg' and 'opt'.

Actually, the vast majority use case is unordered form key=value
substitution.  Complete path segment replacement is second, followed
by URI inserts ("insert this value without further encoding").

> 2. Drop list values.
>
> Again, the majority of use cases out there have no need for list  
> values in template variables, and including them in the spec  
> significantly complicates things.

I think it is complicated because the introduction of list-only
operators (typed functions) is unnecessary.  Complex values can be
addressed in an orthogonal manner when the value is substituted,
mainly by defaulting to the most common form, and more complex
behavior can be defined only when applicable (i.e., a prefix on
the variable name can indicate how to translate a list into
numbered parameters or even associative array key=value sets).
The important thing to note is that compound values are only
interesting when templates are embedded within computer language
processing, so we could easily allow such things to be language
specific by reserving non-alphanumeric prefixes on variable names
for that purpose.

> 3. Make percent-encoding context-sensitive.
>
> There are just too many cases where the 'escape everything but  
> unreserved' rule gets in the way; for example, if my template is  
> "http://example.com/user/{email}", I'm going to have percent- 
> encoded @ signs in my URIs whether I like it or not -- even though  
> they're not required to be percent-encoded there. This is a  
> relatively simple thing to do, as long as we also...

URI inserts could do that.  E.g., use {+email} instead of {email}.

> 4. Allow exceptions to percent-encoding.
>
> We need a syntax that allows characters to be excepted from  
> encoding, even in context. As a straw-man, I suggest preceding the  
> expression with the characters that are excepted, like:
>
>    http://example.com/{/path}
>    http://example.com/thing{?&=query_args}
>
> and so forth.

That is much more complex.  Dynamically changing the transcoding
algorithm is far more expensive than just using a different operator
for non-encoded insertion.

> 5. If we keep operators at all, mint special ones for the common  
> cases.
>
> E.g., something to handle encoded form query values "out of the box":
>   http://example.com/thing{-?a=foo&b=bar&c=baz}
> and likewise with matrix parameters.

Something like

     var   = "value";
     undef = null;
     empty = "";
     list  = [ "val1", "val2", "val3" ];
     keys  = [ "key1", "val1", "key2", "val2", "key3", "val3" ];
     path  = "/foo/bar"
     x     = "1024";
     y     = "768";

{var}                     value
{var=default}             value
{undef=default}           default
{var:3}                   val
{x,y}                     1024,768
{?x,y}                    ?x=1024&y=768
{?x,y,empty}              ?x=1024&y=768&empty=
{?x,y,undef}              ?x=1024&y=768
{;x,y}                    ;x=1024;y=768
{;x,y,empty}              ;x=1024;y=768;empty
{;x,y,undef}              ;x=1024;y=768
{/list,x}                 /val1/val2/val3/1024
{+path}/here              /foo/bar/here
{+path,x}/here            /foo/bar,1024/here
{+path}{x}/here           /foo/bar1024/here
{+empty}/here             /here

I think the above covers all of the common cases without making
the uncommon cases impossible.  The common case is that the delimiters
(";", "?", and "/") are omitted when none of the listed variables are
defined, which matches good URI practice.  Likewise, the substitution
handler for ";" (path parameters) will omit the "=" when its value is  
empty,
whereas the handler for "?" (form queries) will not omit the "=".
Multiple variables and list values have their values joined with ","
if there is no predefined joining mechanism for the operator.

I think this mechanism is simple and readable when used with simple
examples because the single-character operators match the URI generic
syntax delimiters.  Only one operator inserts unencoded values; all
of the others encode any characters other than unreserved.

The mechanism does become harder to read when we do very unusual
things and add all the bells and whistles, like

{var,undef,empty,list}    value,,val1,val2,val3
{/var:3,undef,list,empty} /val/val1/val2/val3/
{;var,undef,empty,list}   ;var=value;empty;list=val1,val2,val3
{?var,undef,empty,list}   ?var=value&empty=&list=val1,val2,val3
{?var,undef,empty,@list}  ? 
var=value&empty=&list1=val1&list2=val2&list3=val3
{?var,undef,empty,%keys}  ? 
var=value&empty=&key1=val1&key2=val2&key3=val3

but we don't need to care if complex cases are hard to read.

The mechanism is extremely simple to implement.  There is always a
variable list (one variable need not be special-cased).
Any of the variables can be prefixed.  Any of the substitutions
can have a default when undefined.

The ABNF is something like

  instruction   = "{" [ operator ] variable-list "}"
  operator      = "/" / "+" / ";" / "?" / op-reserve
  variable-list =  varspec *( "," varspec )
  varspec       =  [ var-type ] varname [ ":" prefix-len ] [ "="  
default ]
  var-type      = "@" / "%" / type-reserve
  varname       = ALPHA *( ALPHA | DIGIT | "_" )
  prefix-len    = 1*DIGIT
  default       = *( unreserved / reserved )
  op-reserve    = <anything else that isn't ALPHA or operator>
  type-reserve  = <anything else that isn't ALPHA, ",", or operator>

as a quick pass (I haven't checked it).  It is extremely easy to
parse and perform the substitutions within a single pass loop.

....Roy
Received on Tuesday, 16 September 2008 02:29:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:41 GMT