W3C home > Mailing lists > Public > uri@w3.org > December 2007

Re: draft-gregorio-uritemplate-02.txt

From: John Cowan <cowan@ccil.org>
Date: Sun, 2 Dec 2007 23:18:40 -0500
To: URI <uri@w3.org>
Message-ID: <20071203041840.GH8528@mercury.ccil.org>

Manger, James H scripsit:

> §3, 1st sentence: Allow a template to have no expansions (ie just be a URI).


> §3.1: Values of variables should not be restricted. They should just
> be escaped when building the URI. 

This is a fundamental issue that needs to be resolved.  My original
understanding of -00 and -01 was that the characters in variable values
were plain Unicode, and they would be escaped as needed.  Thus % would
become %25, / would become %2F, and a non-breaking space (U+00AO) would
become %C2%A0.  I still think this is TRT.

However, -02 says that the values of variables can only be unreserved
(letters, digits, hyphen, dot, underscore, tilde) or %xx substrings.
I think this completely undermines the original design of templates,
which was to provide escaping for what gets inserted into the template,
whereas reserved characters had to appear literally in the template.

It's true that section 3.1 of -02 says:

	For variable values that are strings that have characters outside
	that range, the entire string must be converted into UTF-8
	[RFC3629], and then every octet of the UTF-8 string that falls
	outside of ( unreserved / pct-encoded ) MUST be percent-encoded,
	as per [RFC3986], section 2.1.

but that does not make clear which component MUST do the encoding.
I take it to mean (a) that the outlawed characters MUST be encoded
*before* being passed to the URI Template processor, rather than (b)
that the URI Template processor MUST encode them itself.

If (b) is in fact intended, we wind up with something very useful,
I think: the template processor will properly encode every Unicode
character except % when it is followed by two hex digits.  In that case,
the %xx is placed into the result URI unchanged.  So if you want the
result to be "20%10", the value of the variable must be "20%2510" (or
"20" followed by a Ctrl+P, I suppose).

If that's what you want, some such wording as this is appropriate:

	The value of every non-list variable, and the individual values
	in list variables, is a sequence of Unicode characters.  Any
	character in the unreserved set is included in the URI unchanged;
	any other character is converted to a UTF-8 representation,
	and the bytes are percent-encoded as per [RFC3986], section 2.1.
	As a special exception, if the value contains a "%" character
	followed by two hexdigits, the three-character sequence is
	included in the URI unchanged.

The IRI text would be similar, replacing unreserved with iunreserved.

> The simplest model is to say each
> variable value is a list of (Unicode) strings. An empty list is treated
> as if the variable was undefined.

I think undefined should be just that, undefined.  Your arguments against treating the
empty string as undefined apply with equal strength to the empty list.

> 3.3.6: Move the last paragraph (�\200\234the result of substitution
> MUST match�\200��\200\235) to §3.3 (before §3.3.1) as it
> applies to all the rules, not just 'listjoin'.
> §3.4: foo is used as a variable value so don’t reuse it as an
> undefined variable name. It has unnecessary potential to confuse
> the reader.

+1 to both

> §3.4: A zero-length string should be a legitimate value. It should not
> be treated as undefined. It is not necessary to make a zero-length
> string a special case. Perhaps a user interface might translate
> an empty text input box into an undefined variable, instead of a
> zero-length string, but the URI template processor API should not
> force that arrangement.


> More importantly, we need easy support for query parameters where the
> query name does not match the variable name. We need easy support for
> the leading '?' or '&'.

I don't think this is a big issue: a stray trailing "?" with no parameters
shouldn't be a problem.

John Cowan   cowan@ccil.org    http://ccil.org/~cowan
Original line from The Warrior's Apprentice by Lois McMaster Bujold:
"Only on Barrayar would pulling a loaded needler start a stampede toward one."
English-to-Russian-to-English mangling thereof: "Only on Barrayar you risk to
lose support instead of finding it when you threat with the charged weapon."
Received on Monday, 3 December 2007 04:18:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:11 UTC