- From: Joe Gregorio <joe@bitworking.org>
- Date: Fri, 23 Nov 2007 04:11:20 -0500
- To: "James M Snell" <jasnell@gmail.com>
- Cc: URI <uri@w3.org>
On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote: > Joe Gregorio wrote: > > 2. The 'sub' operator could either be defined to operate on > > the octets of the variables value, or on the unicode character points > > of the equivalent utf-8 decoded string. Both have their pros and cons. > > > > I would think that unicode codepoints would be what folks would > typically expect. If we need to support both, different op codes can be > used... > > octets = {-sub|0-1|username} > codepoints = {-subc|0-1|username} In updating the specification and associated code and examples I've come to believe that you can't do it by unicode codepoint, simply because you can't be certain that the source data was a unicode string. That is, I was going to suggest that '-sub' work by: 1. percent-decode the variables value 2. convert it from UTF-8 to unicode 3. do the sub-string selection on the codepoints. 4. substitute the substring of codepoints after they are converted back to UTF-8 and percent-encode all octets that fall outside ( unreserved / pct-encoded ). That won't work because the value might be a percent-encoded binary blob. Here is a concrete example, the following substring operator will fail using the above algorithm: Vars: foo := %FF%FF%FF Template: {-sub|0-1|foo} I see several different solutions: 1. Keep '-sub' but only have it act on the variable value w/o doing any decoding back to codepoints. I.e. {-sub|0-1|foo=%FF%FF%FF} becomes: "%F" Of limited use. 2. Keep '-sub' and define the algorithm to decode back to codepoints but put large warnings in the spec not to design URI Templates that would apply a '-sub' expansion on a non-unicode string variable. In this case the above expansion would fail. 3. Drop '-sub'. At this point this is probably my favorite option. I'm not sure how useful '-sub' would be and that the functionality it offers can't be done using the other operators. For example, the motivating example was: Vars: username := jcgregorio Template: {-sub|0-0|username}/{username} URI: j/jcgregorio But couldn't that be defined as: Vars: username := jcgregorio firstinitial := j Template: {firstinitial}/{username} URI: j/jcgregorio Thanks, -joe -- Joe Gregorio http://bitworking.org
Received on Friday, 23 November 2007 09:18:26 UTC