Re: Updated URI Template proposal from James M Snell on 2007-11-23 (uri@w3.org from November 2007)

From: James M Snell <jasnell@gmail.com>
Date: Fri, 23 Nov 2007 12:46:00 -0800
To: Joe Gregorio <joe@bitworking.org>
CC: URI <uri@w3.org>
Message-ID: <47473C08.7030902@gmail.com>
Well, I'm not absolutely convinced it's required either but I can
definitely imagine scenarios where it would be useful.  One possible
approach would be to have sub work against unreserved and pct-encoded
characters, e.g.

  template-char = unreserved / pct-encoded

sub would operate on template-char

  {-sub|0-1|foo=%FF%FF%FF}  == %FF

  {-sub|0-2|foo=f%FFf%FF}   == f%FF

  {-sub|0-3|foo=f%FFf%FF}   == f%FFf

  {-sub|1-2|foo=f%FFf%FF}   == %FFf

- James

Joe Gregorio wrote:
> On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote:
>> Joe Gregorio wrote:
>>> 2. The 'sub' operator could either be defined to operate on
>>>     the octets of the variables value, or on the unicode character points
>>>     of the equivalent utf-8 decoded string. Both have their pros and cons.
>>>
>> I would think that unicode codepoints would be what folks would
>> typically expect.  If we need to support both, different op codes can be
>> used...
>>
>>   octets     = {-sub|0-1|username}
>>   codepoints = {-subc|0-1|username}
> 
> In updating the specification and associated code and
> examples I've come to believe that you can't do it
> by unicode codepoint, simply because you can't be
> certain that the source data was a unicode string.
> That is, I was going to suggest that '-sub' work by:
> 
>  1. percent-decode the variables value
>  2. convert it from UTF-8 to unicode
>  3. do the sub-string selection on the codepoints.
>  4. substitute the substring of codepoints after they are
>     converted back to UTF-8 and percent-encode all octets
>     that fall outside ( unreserved / pct-encoded ).
> 
> That won't work because the value might be a percent-encoded binary
> blob.  Here is a concrete example, the following substring operator will
> fail using the above algorithm:
> 
>    Vars:
>        foo := %FF%FF%FF
>    Template:
>        {-sub|0-1|foo}
> 
> 
> I see several different solutions:
> 
> 
> 1. Keep '-sub' but only have it act on the variable
>     value w/o doing any decoding back to codepoints.
> 
>     I.e.
>        {-sub|0-1|foo=%FF%FF%FF}
>     becomes:
>        "%F"
> 
>     Of limited use.
> 
> 2. Keep '-sub' and define the algorithm to decode
>     back to codepoints but put large warnings in the spec
>     not to design URI Templates that would apply a '-sub'
>     expansion on a non-unicode string variable.
> 
>     In this case the above expansion would fail.
> 
> 3. Drop '-sub'.
> 
>    At this point this is probably my favorite option. I'm not sure
>    how useful '-sub' would be and that the functionality it offers
>    can't be done using the other operators. For example, the motivating
>    example was:
> 
>    Vars:
>        username := jcgregorio
>    Template:
>        {-sub|0-0|username}/{username}
>    URI:
>        j/jcgregorio
> 
>   But couldn't that be defined as:
> 
>    Vars:
>        username := jcgregorio
>        firstinitial   := j
>    Template:
>        {firstinitial}/{username}
>    URI:
>        j/jcgregorio
> 
> 
>    Thanks,
>    -joe
>
Received on Friday, 23 November 2007 20:46:24 UTC