URI Templates: { ^ prefix ^ variable [] separator | default } from Manger, James H on 2007-10-23 (uri@w3.org from October 2007)

From: Manger, James H <James.H.Manger@team.telstra.com>
Date: Tue, 23 Oct 2007 15:14:07 +1000
To: <uri@w3.org>
Message-ID: <6215401E01247448A306C54F499111F2035E23E2@WSMSG2103V.srv.dir.telstra.com>
Next cut of my syntax suggestion: SYNTAX, EXAMPLES and DISCUSSION (with faults compared to Joe Gregorio’s (et al) syntax).

SYNTAX
A template consists of {…} segments and other text. A URI is built from a template by substituting each segment from left-to-right, with replacement text. Each segment has the following syntax:

 { ^ prefix ^ variable [] separator | default }

The spaces above are just for display, to separate the tokens.
<variable> is mandatory. All other tokens (including ^ ^ [] and |) are optional.

<variable> (the name, not the value) MUST start with an alphabetic character and MUST only contains characters from <unreserved>, or it can be the empty string.
 variable = "" / ALPHA *unreserved

<separator> and <default> can include any characters legal in URIs.
<prefix> can include any characters legal in URIs, except the sequence [].
Note: ^ | { and } are not legal in URIs, while [ and ] are legal.
<separator> is present if, and only if, [] is present.
<default> is present if, and only if, | is present.

The ^ delimiting <prefix> from <variable> MAY be omitted if no suffix of <prefix> could be a <variable>. For instance, when <prefix> ends with a <reserved> character or contains no ALPHA characters, which often will be the case.

The variable value - if it is defined - is either a string or, if [] is present, an array of strings. Each string is encoded during substitution. If an initial ^ is absent each character not in <unreserved> is %-encoded. If an initial ^ is present each character not in <unreserved> or <reserved> is %-encoded.

The replacement is <default> (and does not include <prefix>, <separator> or <name>) if:
* the variable value is undefined;
* [] is absent and the variable value is an empty string; or
* [] is present and the variable value is an array with no items.
The default value for <default> is an empty string.

When [] is absent and the variable value is a non-empty string, the replacement is <prefix> followed by the encoded variable value.

When [] is present and the variable value is a non-empty array of strings, the replacement is <prefix> followed by each encoded array string separated by <separator>. There is no <separator> after the last string in the array.

The following special rules apply when <prefix> matches <queryprefix>.
 queryprefix = ( "?" / "&" ) name "="
 name = *<any valid URI character excluding "&", "#" and "=">
If <variable> is an empty string it becomes <name>.
If an unencoded "?" appears in the portion of the URI that has already been built (after replacing earlier segments) then change the first character of <prefix> to "&". Otherwise change it to "?".

EXAMPLES
Variables:
 alpha=“Hi”
 beta=“0.9”
 gamma=“A|-B/-C”
 delta=[ “a”, “b”, “”, “d” ]
 epsilon is undefined

/news{/beta}/{alpha} -> /news/0.9/Hi
/search?q={gamma}{&epsilon=}{&order=beta} -> /search?q=A%7C-B%2F-C&order=0.9
/-/{^gamma} -> /-/A%7C-B/-C
/login/{user|../enrol} -> /login/../enrol -> /enrol
/answers{;delta[]} -> /answers;ab%20d
/stuff/{delta[]/} -> /stuff/a/b/%20/d
/goo{?x=delta[]&x=} -> /goo?x=a&x=b&x=%20&x=d

DISCUSSION

I changed from ! to ^ to indicate when reserved characters in the variable value are NOT escaped. I like ! as a warning to be careful, but ! is allowed in URIs and could be useful as a prefix. ^ is not allowed in URIs so can be used as a template syntax character more easily.

The syntax is XML-friendly compared to Joe’s. Joe’s uses < > and &, which require escaping in XML -- making templates more awkward to read and write. < and > are already used as delimiters in HTTP headers, particularly the proposed Link-Template header. There may not be a clash if < and > only appear inside {…} within a template, but it adds some confusion.

Not requiring an explicit separator between <prefix> and <variable> means templates are quite lean in common cases. I expect { and } will often be the only syntax characters required. For instance, {/foo}/home{.lang}{?age=bar}{&x=}{&y=}.

Variable names cannot have arbitrary URI characters in this syntax. This prevents “namespaced names” and “URLs as names”. However, I think supporting simple templates cleanly is more valuable. I guess variable names could be optionally double quoted to allow any URI-legal character, as Mark Nottingham suggested [http://lists.w3.org/Archives/Public/uri/2007Oct/0075.html].

Comparing with Joe’s list of faults with his syntax [http://lists.w3.org/Archives/Public/uri/2007Oct/0015.html]. +, - and ? indicate if my syntax is better, worse or I’m not sure.
+  1. DOES handle repeated query parameters.
   2. Doesn't specify if variables are mandatory or optional.
   3. Doesn't handle encodings besides UTF-8.
+  4. Template language IS NOT TOO complex, cryptic.
   5. No handling of input validation, enums, ranges, etc.
   6. Possible to define a self-inconsistent URI Template:
         1. {fred[]}{fred}
-  7. NO SUFFIXES: Prefixes and suffixes are redundant, as
        they could be handled by using the '?' expansion.
+  8. ARRAY expansions DO HAVE two strings, SO one to separate
        name-value pairs, AND ANOTHER to separate names from
        values CAN BE SPECIFIED (IT IS NOT hard-coded to "=").
        The two strings are <prefix> and <separator>, which are slightly
        different from <between pairs> and <between name and value>,
        but are basically equivalent. {x,delta[];x,} -> x,a;x,b;x,%20;x,d
   9. Sensible defaults need to be invented to deal with parameter values
       that are lists when not expected to be (or are not lists when
expected to be) (see #6).
  10. No specification for how to handle IRIs beyond "Turn an IRI Template
       into a URI Template and then proceed."
+ 11. Trailing "?" CAN BE STRIPPED from URIs with no parameters. {?foo=}
- 12. Potential danger from inserting reserved characters a value, via {^…}

James Manger

P.S. Would I be right to assume people prefer plain text email on this list, despite the fact that it would be easier to distinguish protocol characters, optional tokens etc in HTML or rich text?
Received on Tuesday, 23 October 2007 05:14:52 UTC