RE: URI Templates: done or dead? from Phillips, Addison on 2008-09-16 (uri@w3.org from September 2008)

From: Phillips, Addison <addison@amazon.com>
Date: Mon, 15 Sep 2008 20:21:59 -0700
To: "Roy T. Fielding" <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>
CC: URI <uri@w3.org>, Joe Gregorio <joe@bitworking.org>, David Orchard <orchard@pacificspirit.com>, Marc Hadley <Marc.Hadley@Sun.COM>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA014BD0DECC@EX-SEA5-D.ant.amazon.com>
>   varname       = ALPHA *( ALPHA | DIGIT | "_" )

We have pretty good knowledge of what makes a good Unicode identifier. If we're going to assign variable names in a new pattern language, why are we limiting it to alphanum? The software we are linking to (the part generating the variables that get substituted in) may not--indeed probably does not--have that same limitation.

While the result needs to be a valid URI, there doesn't seem to be a reason for the pattern language itself to be limited in this way. Just because your personal examples are all ASCII doesn't make that the right solution for the world. For that matter, path values and so forth are plain text Unicode and encoded to URI as appropriate as the URI is assembled from the template. The replacement syntax should probably consider the character vs. bytes problem, especially in the query part, since the templates syntax is heavily character oriented.

But you knew I was going to say that :-).

Addison

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: uri-request@w3.org [mailto:uri-request@w3.org] On Behalf Of
> Roy T. Fielding
> Sent: Monday, September 15, 2008 7:29 PM
> To: Mark Nottingham
> Cc: URI; Joe Gregorio; David Orchard; Marc Hadley
> Subject: Re: URI Templates: done or dead?
> 
> 
> On Sep 15, 2008, at 4:57 AM, Mark Nottingham wrote:
> > There hasn't been a lot of discussion or activity on URI
> Templates
> > recently, which either means it's very stable, or very nearly
> dead.
> 
> We should just remind the authors that they have several
> outstanding
> comments on the spec and see if they are still interested in
> editing.
> 
> > If it's very stable, we should ship it and be done with it. If
> it's
> > nearly dead (and I do get a whiff of that; while I continuously
> > hear people clamouring for it to be finished, not many seem to be
> > willing to use it in its current state; YMMV), we should at least
> > try to revive it.
> 
> I won't use it in its current state because it isn't finished yet.
> The prose is, at best, an outline.  The operators aren't even
> defined
> in words -- the reader has to guess why they exist.  The examples
> seem
> to be obsessed with the most irrelevant corner cases instead of
> teaching
> the common cases first.  And it is far too focused on python
> language
> as a means of definition.  None of these are technical issues.
> I am not griping about the lack of completion because I have a
> similar
> list of issues with the HTTPbis spec that I haven't done yet either.
> 
> Technically, the mechanism is caught halfway between being concise
> and
> being human friendly, which means it is currently neither.  My
> opinion
> is that we have IETF specs for the purpose of defining
> interoperable
> protocols, not to define user interfaces, and so the argument that
> these
> things should be end-user readable is unfounded and potentially
> very
> costly.
> They only need to be readable by the folks who are defining
> applications.
> 
> > My continuing concerns with the -03 draft are that it's too
> > complex, not human-friendly, and it makes the common, simple use
> > cases hard. The first example in the spec
> ( http://www.example.com/

> > users/{userid} ) holds up well, but it goes quickly downhill from
> > there; ( http://www.example.com/?{-join|&|query,number} ) looks
> > like line noise, IMHO.
> 
> Then please let's drop the idea of using english words as function
> names and go back to the use cases that really matter.  I need a
> way to describe substitutions for variable values, value prefixes,
> URI inserts, ordered value lists, and unordered substitutions
> within path segments (path ;param=value) and queries (form
> &var=value).
> Only one of those (URI inserts) needs a raw substitution.
> 
> I could use some other tricks as well, but the above is what I know
> is needed.  Joe had a lot more use cases that I have probably
> forgotten.
> 
> > I believe there are a few things we can do to make URI Template
> > more broadly useful and useable, without sacrificing too much
> > functionality (at least in the 80% case).
> >
> > 1. Reduce or drop operators.
> >
> > As mentioned above, they don't read well; they're obviously
> > intended for machines, not people. The expansion for a template
> > should be blindingly obvious, but the operator syntax seems to
> want
> > to get in the way rather than help. Furthermore, the vast
> majority
> > of use cases for templates are for simple template substitution,
> > not operations like 'neg' and 'opt'.
> 
> Actually, the vast majority use case is unordered form key=value
> substitution.  Complete path segment replacement is second,
> followed
> by URI inserts ("insert this value without further encoding").
> 
> > 2. Drop list values.
> >
> > Again, the majority of use cases out there have no need for list
> > values in template variables, and including them in the spec
> > significantly complicates things.
> 
> I think it is complicated because the introduction of list-only
> operators (typed functions) is unnecessary.  Complex values can be
> addressed in an orthogonal manner when the value is substituted,
> mainly by defaulting to the most common form, and more complex
> behavior can be defined only when applicable (i.e., a prefix on
> the variable name can indicate how to translate a list into
> numbered parameters or even associative array key=value sets).
> The important thing to note is that compound values are only
> interesting when templates are embedded within computer language
> processing, so we could easily allow such things to be language
> specific by reserving non-alphanumeric prefixes on variable names
> for that purpose.
> 
> > 3. Make percent-encoding context-sensitive.
> >
> > There are just too many cases where the 'escape everything but
> > unreserved' rule gets in the way; for example, if my template is
> > "http://example.com/user/{email}", I'm going to have percent-
> > encoded @ signs in my URIs whether I like it or not -- even
> though
> > they're not required to be percent-encoded there. This is a
> > relatively simple thing to do, as long as we also...
> 
> URI inserts could do that.  E.g., use {+email} instead of {email}.
> 
> > 4. Allow exceptions to percent-encoding.
> >
> > We need a syntax that allows characters to be excepted from
> > encoding, even in context. As a straw-man, I suggest preceding
> the
> > expression with the characters that are excepted, like:
> >
> >    http://example.com/{/path}

> >    http://example.com/thing{?&=query_args}

> >
> > and so forth.
> 
> That is much more complex.  Dynamically changing the transcoding
> algorithm is far more expensive than just using a different
> operator
> for non-encoded insertion.
> 
> > 5. If we keep operators at all, mint special ones for the common
> > cases.
> >
> > E.g., something to handle encoded form query values "out of the
> box":
> >   http://example.com/thing{-?a=foo&b=bar&c=baz}

> > and likewise with matrix parameters.
> 
> Something like
> 
>      var   = "value";
>      undef = null;
>      empty = "";
>      list  = [ "val1", "val2", "val3" ];
>      keys  = [ "key1", "val1", "key2", "val2", "key3", "val3" ];
>      path  = "/foo/bar"
>      x     = "1024";
>      y     = "768";
> 
> {var}                     value
> {var=default}             value
> {undef=default}           default
> {var:3}                   val
> {x,y}                     1024,768
> {?x,y}                    ?x=1024&y=768
> {?x,y,empty}              ?x=1024&y=768&empty=
> {?x,y,undef}              ?x=1024&y=768
> {;x,y}                    ;x=1024;y=768
> {;x,y,empty}              ;x=1024;y=768;empty
> {;x,y,undef}              ;x=1024;y=768
> {/list,x}                 /val1/val2/val3/1024
> {+path}/here              /foo/bar/here
> {+path,x}/here            /foo/bar,1024/here
> {+path}{x}/here           /foo/bar1024/here
> {+empty}/here             /here
> 
> I think the above covers all of the common cases without making
> the uncommon cases impossible.  The common case is that the
> delimiters
> (";", "?", and "/") are omitted when none of the listed variables
> are
> defined, which matches good URI practice.  Likewise, the
> substitution
> handler for ";" (path parameters) will omit the "=" when its value
> is
> empty,
> whereas the handler for "?" (form queries) will not omit the "=".
> Multiple variables and list values have their values joined with
> ","
> if there is no predefined joining mechanism for the operator.
> 
> I think this mechanism is simple and readable when used with simple
> examples because the single-character operators match the URI
> generic
> syntax delimiters.  Only one operator inserts unencoded values; all
> of the others encode any characters other than unreserved.
> 
> The mechanism does become harder to read when we do very unusual
> things and add all the bells and whistles, like
> 
> {var,undef,empty,list}    value,,val1,val2,val3
> {/var:3,undef,list,empty} /val/val1/val2/val3/
> {;var,undef,empty,list}   ;var=value;empty;list=val1,val2,val3
> {?var,undef,empty,list}   ?var=value&empty=&list=val1,val2,val3
> {?var,undef,empty,@list}  ?
> var=value&empty=&list1=val1&list2=val2&list3=val3
> {?var,undef,empty,%keys}  ?
> var=value&empty=&key1=val1&key2=val2&key3=val3
> 
> but we don't need to care if complex cases are hard to read.
> 
> The mechanism is extremely simple to implement.  There is always a
> variable list (one variable need not be special-cased).
> Any of the variables can be prefixed.  Any of the substitutions
> can have a default when undefined.
> 
> The ABNF is something like
> 
>   instruction   = "{" [ operator ] variable-list "}"
>   operator      = "/" / "+" / ";" / "?" / op-reserve
>   variable-list =  varspec *( "," varspec )
>   varspec       =  [ var-type ] varname [ ":" prefix-len ] [ "="
> default ]
>   var-type      = "@" / "%" / type-reserve
>   varname       = ALPHA *( ALPHA | DIGIT | "_" )
>   prefix-len    = 1*DIGIT
>   default       = *( unreserved / reserved )
>   op-reserve    = <anything else that isn't ALPHA or operator>
>   type-reserve  = <anything else that isn't ALPHA, ",", or
> operator>
> 
> as a quick pass (I haven't checked it).  It is extremely easy to
> parse and perform the substitutions within a single pass loop.
> 
> ....Roy
Received on Tuesday, 16 September 2008 03:22:43 UTC