W3C home > Mailing lists > Public > uri@w3.org > November 2007

Re: Updated URI Template proposal

From: Joe Gregorio <joe@bitworking.org>
Date: Fri, 23 Nov 2007 04:11:20 -0500
Message-ID: <3f1451f50711230111l4d4afe49g99737ac55eaf0f17@mail.gmail.com>
To: "James M Snell" <jasnell@gmail.com>
Cc: URI <uri@w3.org>

On Nov 5, 2007 1:36 PM, James M Snell <jasnell@gmail.com> wrote:
> Joe Gregorio wrote:
> > 2. The 'sub' operator could either be defined to operate on
> >     the octets of the variables value, or on the unicode character points
> >     of the equivalent utf-8 decoded string. Both have their pros and cons.
> >
>
> I would think that unicode codepoints would be what folks would
> typically expect.  If we need to support both, different op codes can be
> used...
>
>   octets     = {-sub|0-1|username}
>   codepoints = {-subc|0-1|username}

In updating the specification and associated code and
examples I've come to believe that you can't do it
by unicode codepoint, simply because you can't be
certain that the source data was a unicode string.
That is, I was going to suggest that '-sub' work by:

 1. percent-decode the variables value
 2. convert it from UTF-8 to unicode
 3. do the sub-string selection on the codepoints.
 4. substitute the substring of codepoints after they are
    converted back to UTF-8 and percent-encode all octets
    that fall outside ( unreserved / pct-encoded ).

That won't work because the value might be a percent-encoded binary
blob.  Here is a concrete example, the following substring operator will
fail using the above algorithm:

   Vars:
       foo := %FF%FF%FF
   Template:
       {-sub|0-1|foo}


I see several different solutions:


1. Keep '-sub' but only have it act on the variable
    value w/o doing any decoding back to codepoints.

    I.e.
       {-sub|0-1|foo=%FF%FF%FF}
    becomes:
       "%F"

    Of limited use.

2. Keep '-sub' and define the algorithm to decode
    back to codepoints but put large warnings in the spec
    not to design URI Templates that would apply a '-sub'
    expansion on a non-unicode string variable.

    In this case the above expansion would fail.

3. Drop '-sub'.

   At this point this is probably my favorite option. I'm not sure
   how useful '-sub' would be and that the functionality it offers
   can't be done using the other operators. For example, the motivating
   example was:

   Vars:
       username := jcgregorio
   Template:
       {-sub|0-0|username}/{username}
   URI:
       j/jcgregorio

  But couldn't that be defined as:

   Vars:
       username := jcgregorio
       firstinitial   := j
   Template:
       {firstinitial}/{username}
   URI:
       j/jcgregorio


   Thanks,
   -joe

-- 
Joe Gregorio        http://bitworking.org
Received on Friday, 23 November 2007 09:18:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:37 GMT