- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 16 Jan 2007 14:17:43 +1100
- To: Marc Hadley <Marc.Hadley@Sun.COM>, Joe Gregorio <joe@bitworking.org>, James M Snell <jasnell@gmail.com>
- Cc: uri@w3.org
I agree that embedding encoding information isn't desirable, but using:
> Characters outside ( iprivate | iunreserved | '@' | ':' | '/' )
> are % encoded.
as the default encoding rule means that sub-delims ("!" / "$" / "&" /
"'" / "(" / ")" / "*" / "+" / "," / ";" / "=") will *always* be
encoded when expanded from templates; there won't be any way to have
these perfectly legal characters appear in template-generated URIs.
Also, "?" will always be percent-encoded in query and fragment
components, even though it's allowed.
After just a quick browse around the Web, that's too restrictive;
most URI schemes do not use all of the sub-delims, and leave them for
use by specific formats and applications. If they're always percent-
encoded, these use cases won't be allowed.
For example,
* EBay uses "*" and "+" as part of their search URIs; <http://attr-
search.ebay.com/search/search.dll?
sofocus=so&sbrftog=1&catref=C6&from=R10&fccl=1&fcl=4&satitle=razr
+v3*&sacat=146487%26catref%3DC6%26curcat%
3Dtrue&a25664=25635&a25662=-24&a35=-24&a33112=-24&a26093=-24&a10244=-24&
alist=a25664%2Ca25662%2Ca35%2Ca10244%2Ca33112%2Ca3801%
2Ca26093&pfmode=1&reqtype=1&gcs=1440&pfid=1720&pf_query=razr
+v3*&sargn=-1%26saslc%
3D2&sadis=200&fpos=95125&sappl=1&ft=1&ftrt=1&ftrv=7&saprclo=&saprchi=&fs
op=1%26fsoo%3D1&fgtp=> (found on the EBay front page)
* Amazon uses "=" in path parameters; <http://www.amazon.com/Concise-
History-Cambridge-Histories-Updated/dp/0521408482/ref=wl_gtwy_ty/
102-8627554-9105730?%
5Fencoding=UTF8&coliid=IIU7YZ0J27A5W&colid=2ATYUNX0NQEE> (from
Amazon's front page). I guess they could do </ref={ref}/>, but what
if they want to allow several different parameters there?
* Yahoo uses "*" for special purposes when redirecting, and encodes
the ":";
<http://rds.yahoo.com/_ylt=A0oGkmY8N6xFlMQAGxyl87UF/SIG=15lsevai8/
EXP=1169000636/**http%3a//search.yahoo.com/preferences/preferences%
3fpref_done=http%253A%252F%252Fsearch.yahoo.com%252Fweb%
26.bcrumb=05bd394bd778e0e7bc8c4452c66d9f4e%252C1168914236>. The
closes they could get would be <.../**{uri} along with manual
instructions to percent-escape the colon and some other characters in
the uri.
* The Flickr API uses a comma-delimited lists in parameters; see
<http://www.flickr.com/services/api/
flickr.favorites.getPublicList.html>.
* The IETF datatracker uses "+" as a delimiter for spaces, as per
HTML form encoding (old-style).
<https://datatracker.ietf.org/public/idindex.cgi?
command=do_search_id&filename=mark
+nottingham&id_tracker_state_id=-1&wg_id=0&other_group=&status_id=0&last
_name=&first_name=>
* It's a common convention for e-mail addresses to have "+" signs as
well; e.g., <mailto:mnot+home@example.org>.
* It also won't be possible to specify a template like <http://{host}/
foo/bar> and make "host" able to be either an IPv4 address or an
IPv6, because the brackets around the IPv6 address will be escaped.
I didn't have to out of my way to find any of these examples (it took
about five minutes in total), and they're all legal, widely-used URIs.
My point here is that a) a default encoding rule needs to be
conservative, and b) it's going to be necessary for templates to
specify application/format specific encoding rules no matter what we
do (see the IETF, Amazon and Yahoo examples in particular).
Proposal: only escape things outside of ( iprivate / iunreserved /
ireserved ) -- i.e., characters not allowed in URIs. It's up to the
definitions of specific template variables to determine how to
percent-encode beyond that.
E.g.,
---8<---
* The Foo URI Template
Foo is a URI template [RFCxxxx] that allows two (2) variables, "bar"
and "baz". For example;
<http://www.example.com/{bar}?arg={baz}>
The "bar" template variable should have any characters from the sub-
delim rule in [RFC3986] percent-encoded before template expansion.
The "baz" template variable should percent-encode the "&" and "#"
characters before template expansion.
--->8---
Optionally, we can provide a "library" of percent-encoding rules
(likely to be specific to particular URI schemes and/or components)
for template definitions to leverage.
Cheers,
On 2007/01/03, at 6:23 AM, Marc Hadley wrote:
> Good analysis Joe, thanks.
>
> On Dec 27, 2006, at 2:49 PM, James M Snell wrote:
>>
>> Ugh. I'd rather we not go down the path of embedding encoding
>> information into the template. Let's just pick a reasonable
>> default and
>> leave it at that.
>>
> +1, Joe's "default" below looks good to me.
>
> Marc.
>
>> Extensions that affect the selection and validation of the
>> replacement
>> value are fine.
>>
>> - James
>>
>> Joe Gregorio wrote:
>>> [snip]
>>> Allow a ':' at the end of a variable name to separate out
>>> options, and then
>>> add an option 'enc=<enc>' where
>>> 'enc' could be:
>>>
>>> enc="strict"
>>> All characters outside (iprivate | iunreserved) are % encoded
>>>
>>> enc="sub"
>>> Characters outside (iprivate | iunreserved | sub-delims) are %
>>> encoded
>>>
>>> enc="none"
>>> No characters are % encoded
>>>
>>> enc="default"
>>> Or if '=<enc>' isn't provided then the default encoding is used:
>>>
>>> Characters outside ( iprivate | iunreserved | '@' | ':' | '/' )
>>> are
>>> % encoded.
>>>
>>> So back to the example, if we have:
>>>
>>> http://bitworking.org/{path:enc=strict}
>>>
>>> and
>>>
>>> path = "projects/httplib2/"
>>>
>>> then that gets interpreted as:
>>>
>>> http://bitworking.org/projects%2Fhttplib2%2F
>>>
>>> and
>>>
>>> http://bitworking.org/{path:enc=default}
>>>
>>> gets interpreted as:
>>>
>>> http://bitworking.org/projects/httplib2/
>>>
>>> Note that
>>>
>>> http://bitworking.org/{path:enc=default}
>>>
>>> and
>>>
>>> http://bitworking.org/{path}
>>>
>>> will give equivalent values.
>>>
>>> Again, with this I worry about complexity and surprising behavior:
>>>
>>> http://example.org?a={b:enc=strict}
>>> b = "a=test"
>>>
>>> gives:
>>>
>>> http://example.org?a=a%3Dtest
>>>
>>> while
>>>
>>> http://example.org{b:enc=none}
>>> b = "?a=test"
>>>
>>> gives:
>>>
>>> http://example.org?a=test
>>>
>>> -joe
>>>
>>
>
> ---
> Marc Hadley <marc.hadley at sun.com>
> CTO Office, Sun Microsystems.
>
>
--
Mark Nottingham http://www.mnot.net/
Received on Tuesday, 16 January 2007 03:17:37 UTC