W3C home > Mailing lists > Public > uri@w3.org > January 2007

Re: URI and IRI Templating - encoding defaults

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 16 Jan 2007 14:17:43 +1100
Message-Id: <4D68F9CA-F11A-45B1-B0A1-7B5EA4FB5192@mnot.net>
Cc: uri@w3.org
To: Marc Hadley <Marc.Hadley@Sun.COM>, Joe Gregorio <joe@bitworking.org>, James M Snell <jasnell@gmail.com>

I agree that embedding encoding information isn't desirable, but using:

>   Characters outside ( iprivate | iunreserved | '@' | ':' | '/' )  
> are % encoded.

as the default encoding rule means that sub-delims ("!" / "$" / "&" /  
"'" / "(" / ")" / "*" / "+" / "," / ";" / "=") will *always* be  
encoded when expanded from templates; there won't be any way to have  
these perfectly legal characters appear in template-generated URIs.  
Also, "?" will always be percent-encoded in query and fragment  
components, even though it's allowed.

After just a quick browse around the Web, that's too restrictive;  
most URI schemes do not use all of the sub-delims, and leave them for  
use by specific formats and applications. If they're always percent- 
encoded, these use cases won't be allowed.

For example,
* EBay uses "*" and "+" as part of their search URIs; <http://attr- 
search.ebay.com/search/search.dll? 
sofocus=so&sbrftog=1&catref=C6&from=R10&fccl=1&fcl=4&satitle=razr 
+v3*&sacat=146487%26catref%3DC6%26curcat% 
3Dtrue&a25664=25635&a25662=-24&a35=-24&a33112=-24&a26093=-24&a10244=-24& 
alist=a25664%2Ca25662%2Ca35%2Ca10244%2Ca33112%2Ca3801% 
2Ca26093&pfmode=1&reqtype=1&gcs=1440&pfid=1720&pf_query=razr 
+v3*&sargn=-1%26saslc% 
3D2&sadis=200&fpos=95125&sappl=1&ft=1&ftrt=1&ftrv=7&saprclo=&saprchi=&fs 
op=1%26fsoo%3D1&fgtp=> (found on the EBay front page)

* Amazon uses "=" in path parameters; <http://www.amazon.com/Concise- 
History-Cambridge-Histories-Updated/dp/0521408482/ref=wl_gtwy_ty/ 
102-8627554-9105730?% 
5Fencoding=UTF8&coliid=IIU7YZ0J27A5W&colid=2ATYUNX0NQEE> (from  
Amazon's front page). I guess they could do </ref={ref}/>, but what  
if they want to allow several different parameters there?

* Yahoo uses "*" for special purposes when redirecting, and encodes  
the ":";
<http://rds.yahoo.com/_ylt=A0oGkmY8N6xFlMQAGxyl87UF/SIG=15lsevai8/ 
EXP=1169000636/**http%3a//search.yahoo.com/preferences/preferences% 
3fpref_done=http%253A%252F%252Fsearch.yahoo.com%252Fweb% 
26.bcrumb=05bd394bd778e0e7bc8c4452c66d9f4e%252C1168914236>. The  
closes they could get would be <.../**{uri} along with manual  
instructions to percent-escape the colon and some other characters in  
the uri.

* The Flickr API uses a comma-delimited lists in parameters; see  
<http://www.flickr.com/services/api/ 
flickr.favorites.getPublicList.html>.

* The IETF datatracker uses "+" as a delimiter for spaces, as per  
HTML form encoding (old-style).
<https://datatracker.ietf.org/public/idindex.cgi? 
command=do_search_id&filename=mark 
+nottingham&id_tracker_state_id=-1&wg_id=0&other_group=&status_id=0&last 
_name=&first_name=>

* It's a common convention for e-mail addresses to have "+" signs as  
well; e.g., <mailto:mnot+home@example.org>.

* It also won't be possible to specify a template like <http://{host}/ 
foo/bar> and make "host" able to be either an IPv4 address or an  
IPv6, because the brackets around the IPv6 address will be escaped.

I didn't have to out of my way to find any of these examples (it took  
about five minutes in total), and they're all legal, widely-used URIs.

My point here is that a) a default encoding rule needs to be  
conservative, and b) it's going to be necessary for templates to  
specify application/format specific encoding rules no matter what we  
do (see the IETF, Amazon and Yahoo examples in particular).


Proposal: only escape things outside of ( iprivate / iunreserved /  
ireserved ) -- i.e., characters not allowed in URIs. It's up to the  
definitions of specific template variables to determine how to  
percent-encode beyond that.

E.g.,

---8<---
* The Foo URI Template

Foo is a URI template [RFCxxxx] that allows two (2) variables, "bar"  
and "baz". For example;

<http://www.example.com/{bar}?arg={baz}>

The "bar" template variable should have any characters from the sub- 
delim rule in [RFC3986] percent-encoded before template expansion.

The "baz" template variable should percent-encode the "&" and "#"  
characters before template expansion.
--->8---

Optionally, we can provide a "library" of percent-encoding rules  
(likely to be specific to particular URI schemes and/or components)  
for template definitions to leverage.


Cheers,



On 2007/01/03, at 6:23 AM, Marc Hadley wrote:

> Good analysis Joe, thanks.
>
> On Dec 27, 2006, at 2:49 PM, James M Snell wrote:
>>
>> Ugh. I'd rather we not go down the path of embedding encoding
>> information into the template.  Let's just pick a reasonable  
>> default and
>> leave it at that.
>>
> +1, Joe's "default" below looks good to me.
>
> Marc.
>
>> Extensions that affect the selection and validation of the  
>> replacement
>> value are fine.
>>
>> - James
>>
>> Joe Gregorio wrote:
>>> [snip]
>>> Allow a ':' at the end of a variable name to separate out  
>>> options, and then
>>> add an option 'enc=<enc>'  where
>>> 'enc' could be:
>>>
>>> enc="strict"
>>>   All characters outside (iprivate | iunreserved) are % encoded
>>>
>>> enc="sub"
>>>   Characters outside (iprivate | iunreserved | sub-delims) are %  
>>> encoded
>>>
>>> enc="none"
>>>   No characters are % encoded
>>>
>>> enc="default"
>>>   Or if '=<enc>' isn't provided then the default encoding is used:
>>>
>>>   Characters outside ( iprivate | iunreserved | '@' | ':' | '/' )  
>>> are
>>> % encoded.
>>>
>>> So back to the example, if we have:
>>>
>>>    http://bitworking.org/{path:enc=strict}
>>>
>>> and
>>>
>>>  path = "projects/httplib2/"
>>>
>>> then that gets interpreted as:
>>>
>>>  http://bitworking.org/projects%2Fhttplib2%2F
>>>
>>> and
>>>
>>>    http://bitworking.org/{path:enc=default}
>>>
>>> gets interpreted as:
>>>
>>>  http://bitworking.org/projects/httplib2/
>>>
>>> Note that
>>>
>>>    http://bitworking.org/{path:enc=default}
>>>
>>> and
>>>
>>>    http://bitworking.org/{path}
>>>
>>> will give equivalent values.
>>>
>>> Again, with this I worry about complexity and surprising behavior:
>>>
>>>    http://example.org?a={b:enc=strict}
>>>    b = "a=test"
>>>
>>> gives:
>>>
>>>   http://example.org?a=a%3Dtest
>>>
>>> while
>>>
>>>   http://example.org{b:enc=none}
>>>   b = "?a=test"
>>>
>>> gives:
>>>
>>>   http://example.org?a=test
>>>
>>>   -joe
>>>
>>
>
> ---
> Marc Hadley <marc.hadley at sun.com>
> CTO Office, Sun Microsystems.
>
>


--
Mark Nottingham     http://www.mnot.net/
Received on Tuesday, 16 January 2007 03:17:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:10 UTC