- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 23 Nov 2007 12:46:00 -0800
- To: Joe Gregorio <joe@bitworking.org>
- CC: URI <uri@w3.org>
Well, I'm not absolutely convinced it's required either but I can
definitely imagine scenarios where it would be useful. One possible
approach would be to have sub work against unreserved and pct-encoded
characters, e.g.
template-char = unreserved / pct-encoded
sub would operate on template-char
{-sub|0-1|foo=%FF%FF%FF} == %FF
{-sub|0-2|foo=f%FFf%FF} == f%FF
{-sub|0-3|foo=f%FFf%FF} == f%FFf
{-sub|1-2|foo=f%FFf%FF} == %FFf
- James
Joe Gregorio wrote:
> On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote:
>> Joe Gregorio wrote:
>>> 2. The 'sub' operator could either be defined to operate on
>>> the octets of the variables value, or on the unicode character points
>>> of the equivalent utf-8 decoded string. Both have their pros and cons.
>>>
>> I would think that unicode codepoints would be what folks would
>> typically expect. If we need to support both, different op codes can be
>> used...
>>
>> octets = {-sub|0-1|username}
>> codepoints = {-subc|0-1|username}
>
> In updating the specification and associated code and
> examples I've come to believe that you can't do it
> by unicode codepoint, simply because you can't be
> certain that the source data was a unicode string.
> That is, I was going to suggest that '-sub' work by:
>
> 1. percent-decode the variables value
> 2. convert it from UTF-8 to unicode
> 3. do the sub-string selection on the codepoints.
> 4. substitute the substring of codepoints after they are
> converted back to UTF-8 and percent-encode all octets
> that fall outside ( unreserved / pct-encoded ).
>
> That won't work because the value might be a percent-encoded binary
> blob. Here is a concrete example, the following substring operator will
> fail using the above algorithm:
>
> Vars:
> foo := %FF%FF%FF
> Template:
> {-sub|0-1|foo}
>
>
> I see several different solutions:
>
>
> 1. Keep '-sub' but only have it act on the variable
> value w/o doing any decoding back to codepoints.
>
> I.e.
> {-sub|0-1|foo=%FF%FF%FF}
> becomes:
> "%F"
>
> Of limited use.
>
> 2. Keep '-sub' and define the algorithm to decode
> back to codepoints but put large warnings in the spec
> not to design URI Templates that would apply a '-sub'
> expansion on a non-unicode string variable.
>
> In this case the above expansion would fail.
>
> 3. Drop '-sub'.
>
> At this point this is probably my favorite option. I'm not sure
> how useful '-sub' would be and that the functionality it offers
> can't be done using the other operators. For example, the motivating
> example was:
>
> Vars:
> username := jcgregorio
> Template:
> {-sub|0-0|username}/{username}
> URI:
> j/jcgregorio
>
> But couldn't that be defined as:
>
> Vars:
> username := jcgregorio
> firstinitial := j
> Template:
> {firstinitial}/{username}
> URI:
> j/jcgregorio
>
>
> Thanks,
> -joe
>
Received on Friday, 23 November 2007 20:46:24 UTC