- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 16 Jan 2007 14:17:43 +1100
- To: Marc Hadley <Marc.Hadley@Sun.COM>, Joe Gregorio <joe@bitworking.org>, James M Snell <jasnell@gmail.com>
- Cc: uri@w3.org
I agree that embedding encoding information isn't desirable, but using: > Characters outside ( iprivate | iunreserved | '@' | ':' | '/' ) > are % encoded. as the default encoding rule means that sub-delims ("!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=") will *always* be encoded when expanded from templates; there won't be any way to have these perfectly legal characters appear in template-generated URIs. Also, "?" will always be percent-encoded in query and fragment components, even though it's allowed. After just a quick browse around the Web, that's too restrictive; most URI schemes do not use all of the sub-delims, and leave them for use by specific formats and applications. If they're always percent- encoded, these use cases won't be allowed. For example, * EBay uses "*" and "+" as part of their search URIs; <http://attr- search.ebay.com/search/search.dll? sofocus=so&sbrftog=1&catref=C6&from=R10&fccl=1&fcl=4&satitle=razr +v3*&sacat=146487%26catref%3DC6%26curcat% 3Dtrue&a25664=25635&a25662=-24&a35=-24&a33112=-24&a26093=-24&a10244=-24& alist=a25664%2Ca25662%2Ca35%2Ca10244%2Ca33112%2Ca3801% 2Ca26093&pfmode=1&reqtype=1&gcs=1440&pfid=1720&pf_query=razr +v3*&sargn=-1%26saslc% 3D2&sadis=200&fpos=95125&sappl=1&ft=1&ftrt=1&ftrv=7&saprclo=&saprchi=&fs op=1%26fsoo%3D1&fgtp=> (found on the EBay front page) * Amazon uses "=" in path parameters; <http://www.amazon.com/Concise- History-Cambridge-Histories-Updated/dp/0521408482/ref=wl_gtwy_ty/ 102-8627554-9105730?% 5Fencoding=UTF8&coliid=IIU7YZ0J27A5W&colid=2ATYUNX0NQEE> (from Amazon's front page). I guess they could do </ref={ref}/>, but what if they want to allow several different parameters there? * Yahoo uses "*" for special purposes when redirecting, and encodes the ":"; <http://rds.yahoo.com/_ylt=A0oGkmY8N6xFlMQAGxyl87UF/SIG=15lsevai8/ EXP=1169000636/**http%3a//search.yahoo.com/preferences/preferences% 3fpref_done=http%253A%252F%252Fsearch.yahoo.com%252Fweb% 26.bcrumb=05bd394bd778e0e7bc8c4452c66d9f4e%252C1168914236>. The closes they could get would be <.../**{uri} along with manual instructions to percent-escape the colon and some other characters in the uri. * The Flickr API uses a comma-delimited lists in parameters; see <http://www.flickr.com/services/api/ flickr.favorites.getPublicList.html>. * The IETF datatracker uses "+" as a delimiter for spaces, as per HTML form encoding (old-style). <https://datatracker.ietf.org/public/idindex.cgi? command=do_search_id&filename=mark +nottingham&id_tracker_state_id=-1&wg_id=0&other_group=&status_id=0&last _name=&first_name=> * It's a common convention for e-mail addresses to have "+" signs as well; e.g., <mailto:mnot+home@example.org>. * It also won't be possible to specify a template like <http://{host}/ foo/bar> and make "host" able to be either an IPv4 address or an IPv6, because the brackets around the IPv6 address will be escaped. I didn't have to out of my way to find any of these examples (it took about five minutes in total), and they're all legal, widely-used URIs. My point here is that a) a default encoding rule needs to be conservative, and b) it's going to be necessary for templates to specify application/format specific encoding rules no matter what we do (see the IETF, Amazon and Yahoo examples in particular). Proposal: only escape things outside of ( iprivate / iunreserved / ireserved ) -- i.e., characters not allowed in URIs. It's up to the definitions of specific template variables to determine how to percent-encode beyond that. E.g., ---8<--- * The Foo URI Template Foo is a URI template [RFCxxxx] that allows two (2) variables, "bar" and "baz". For example; <http://www.example.com/{bar}?arg={baz}> The "bar" template variable should have any characters from the sub- delim rule in [RFC3986] percent-encoded before template expansion. The "baz" template variable should percent-encode the "&" and "#" characters before template expansion. --->8--- Optionally, we can provide a "library" of percent-encoding rules (likely to be specific to particular URI schemes and/or components) for template definitions to leverage. Cheers, On 2007/01/03, at 6:23 AM, Marc Hadley wrote: > Good analysis Joe, thanks. > > On Dec 27, 2006, at 2:49 PM, James M Snell wrote: >> >> Ugh. I'd rather we not go down the path of embedding encoding >> information into the template. Let's just pick a reasonable >> default and >> leave it at that. >> > +1, Joe's "default" below looks good to me. > > Marc. > >> Extensions that affect the selection and validation of the >> replacement >> value are fine. >> >> - James >> >> Joe Gregorio wrote: >>> [snip] >>> Allow a ':' at the end of a variable name to separate out >>> options, and then >>> add an option 'enc=<enc>' where >>> 'enc' could be: >>> >>> enc="strict" >>> All characters outside (iprivate | iunreserved) are % encoded >>> >>> enc="sub" >>> Characters outside (iprivate | iunreserved | sub-delims) are % >>> encoded >>> >>> enc="none" >>> No characters are % encoded >>> >>> enc="default" >>> Or if '=<enc>' isn't provided then the default encoding is used: >>> >>> Characters outside ( iprivate | iunreserved | '@' | ':' | '/' ) >>> are >>> % encoded. >>> >>> So back to the example, if we have: >>> >>> http://bitworking.org/{path:enc=strict} >>> >>> and >>> >>> path = "projects/httplib2/" >>> >>> then that gets interpreted as: >>> >>> http://bitworking.org/projects%2Fhttplib2%2F >>> >>> and >>> >>> http://bitworking.org/{path:enc=default} >>> >>> gets interpreted as: >>> >>> http://bitworking.org/projects/httplib2/ >>> >>> Note that >>> >>> http://bitworking.org/{path:enc=default} >>> >>> and >>> >>> http://bitworking.org/{path} >>> >>> will give equivalent values. >>> >>> Again, with this I worry about complexity and surprising behavior: >>> >>> http://example.org?a={b:enc=strict} >>> b = "a=test" >>> >>> gives: >>> >>> http://example.org?a=a%3Dtest >>> >>> while >>> >>> http://example.org{b:enc=none} >>> b = "?a=test" >>> >>> gives: >>> >>> http://example.org?a=test >>> >>> -joe >>> >> > > --- > Marc Hadley <marc.hadley at sun.com> > CTO Office, Sun Microsystems. > > -- Mark Nottingham http://www.mnot.net/
Received on Tuesday, 16 January 2007 03:17:37 UTC