- From: Stefan Eissing <stefan.eissing@greenbytes.de>
- Date: Mon, 26 Nov 2007 08:44:51 +0100
- To: James M Snell <jasnell@gmail.com>
- Cc: Joe Gregorio <joe@bitworking.org>, URI <uri@w3.org>
-1. I think this can get very messy real quick. When we slip down the string-operators slope, we would soon need to define string.length this way and possibly invent our own regular expression sub-slang (as if the world would need another one) afterwards. Am 23.11.2007 um 21:46 schrieb James M Snell: > > Well, I'm not absolutely convinced it's required either but I can > definitely imagine scenarios where it would be useful. One possible > approach would be to have sub work against unreserved and pct-encoded > characters, e.g. > > template-char = unreserved / pct-encoded > > sub would operate on template-char > > {-sub|0-1|foo=%FF%FF%FF} == %FF > > {-sub|0-2|foo=f%FFf%FF} == f%FF > > {-sub|0-3|foo=f%FFf%FF} == f%FFf > > {-sub|1-2|foo=f%FFf%FF} == %FFf > > - James > > Joe Gregorio wrote: >> On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote: >>> Joe Gregorio wrote: >>>> 2. The 'sub' operator could either be defined to operate on >>>> the octets of the variables value, or on the unicode >>>> character points >>>> of the equivalent utf-8 decoded string. Both have their pros >>>> and cons. >>>> >>> I would think that unicode codepoints would be what folks would >>> typically expect. If we need to support both, different op codes >>> can be >>> used... >>> >>> octets = {-sub|0-1|username} >>> codepoints = {-subc|0-1|username} >> >> In updating the specification and associated code and >> examples I've come to believe that you can't do it >> by unicode codepoint, simply because you can't be >> certain that the source data was a unicode string. >> That is, I was going to suggest that '-sub' work by: >> >> 1. percent-decode the variables value >> 2. convert it from UTF-8 to unicode >> 3. do the sub-string selection on the codepoints. >> 4. substitute the substring of codepoints after they are >> converted back to UTF-8 and percent-encode all octets >> that fall outside ( unreserved / pct-encoded ). >> >> That won't work because the value might be a percent-encoded binary >> blob. Here is a concrete example, the following substring >> operator will >> fail using the above algorithm: >> >> Vars: >> foo := %FF%FF%FF >> Template: >> {-sub|0-1|foo} >> >> >> I see several different solutions: >> >> >> 1. Keep '-sub' but only have it act on the variable >> value w/o doing any decoding back to codepoints. >> >> I.e. >> {-sub|0-1|foo=%FF%FF%FF} >> becomes: >> "%F" >> >> Of limited use. >> >> 2. Keep '-sub' and define the algorithm to decode >> back to codepoints but put large warnings in the spec >> not to design URI Templates that would apply a '-sub' >> expansion on a non-unicode string variable. >> >> In this case the above expansion would fail. >> >> 3. Drop '-sub'. >> >> At this point this is probably my favorite option. I'm not sure >> how useful '-sub' would be and that the functionality it offers >> can't be done using the other operators. For example, the >> motivating >> example was: >> >> Vars: >> username := jcgregorio >> Template: >> {-sub|0-0|username}/{username} >> URI: >> j/jcgregorio >> >> But couldn't that be defined as: >> >> Vars: >> username := jcgregorio >> firstinitial := j >> Template: >> {firstinitial}/{username} >> URI: >> j/jcgregorio >> >> >> Thanks, >> -joe >> > -- <green/>bytes GmbH, Hafenweg 16, D-48155 Münster, Germany Amtsgericht Münster: HRB5782
Received on Monday, 26 November 2007 07:45:09 UTC