- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 23 Nov 2007 12:46:00 -0800
- To: Joe Gregorio <joe@bitworking.org>
- CC: URI <uri@w3.org>
Well, I'm not absolutely convinced it's required either but I can definitely imagine scenarios where it would be useful. One possible approach would be to have sub work against unreserved and pct-encoded characters, e.g. template-char = unreserved / pct-encoded sub would operate on template-char {-sub|0-1|foo=%FF%FF%FF} == %FF {-sub|0-2|foo=f%FFf%FF} == f%FF {-sub|0-3|foo=f%FFf%FF} == f%FFf {-sub|1-2|foo=f%FFf%FF} == %FFf - James Joe Gregorio wrote: > On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote: >> Joe Gregorio wrote: >>> 2. The 'sub' operator could either be defined to operate on >>> the octets of the variables value, or on the unicode character points >>> of the equivalent utf-8 decoded string. Both have their pros and cons. >>> >> I would think that unicode codepoints would be what folks would >> typically expect. If we need to support both, different op codes can be >> used... >> >> octets = {-sub|0-1|username} >> codepoints = {-subc|0-1|username} > > In updating the specification and associated code and > examples I've come to believe that you can't do it > by unicode codepoint, simply because you can't be > certain that the source data was a unicode string. > That is, I was going to suggest that '-sub' work by: > > 1. percent-decode the variables value > 2. convert it from UTF-8 to unicode > 3. do the sub-string selection on the codepoints. > 4. substitute the substring of codepoints after they are > converted back to UTF-8 and percent-encode all octets > that fall outside ( unreserved / pct-encoded ). > > That won't work because the value might be a percent-encoded binary > blob. Here is a concrete example, the following substring operator will > fail using the above algorithm: > > Vars: > foo := %FF%FF%FF > Template: > {-sub|0-1|foo} > > > I see several different solutions: > > > 1. Keep '-sub' but only have it act on the variable > value w/o doing any decoding back to codepoints. > > I.e. > {-sub|0-1|foo=%FF%FF%FF} > becomes: > "%F" > > Of limited use. > > 2. Keep '-sub' and define the algorithm to decode > back to codepoints but put large warnings in the spec > not to design URI Templates that would apply a '-sub' > expansion on a non-unicode string variable. > > In this case the above expansion would fail. > > 3. Drop '-sub'. > > At this point this is probably my favorite option. I'm not sure > how useful '-sub' would be and that the functionality it offers > can't be done using the other operators. For example, the motivating > example was: > > Vars: > username := jcgregorio > Template: > {-sub|0-0|username}/{username} > URI: > j/jcgregorio > > But couldn't that be defined as: > > Vars: > username := jcgregorio > firstinitial := j > Template: > {firstinitial}/{username} > URI: > j/jcgregorio > > > Thanks, > -joe >
Received on Friday, 23 November 2007 20:46:24 UTC