RE: Clarifications on CR117

I'm returning to this topic based on my AI to look at CR117 further.  Youenn
does a good job below of pointing out more of the potential issues.  IMO
this can be boiled down to two questions:

1) Do we allow the user the power to create URLs from data that either
result in malformed URIs, non-reversable data, or both?
2) If we do, can we advise the user, or the WSDL processor, on how to bind
the data safely?

Youenn's suggestion below of providing a "safe mode" in which %-encoding is
applied to the data before inserting it into a template is interesting.  I
can imagine this being exposed as a feature of the templating language
directly:

	whttp:location="{raw}?more={%encoded}"

Where the % directs the WSDL processor to encode the data (otherwise it's
stuffed in raw).  Actually, the reverse would probably be better - encode
unless the user makes an effort to ask for the raw mode:

	whttp:location="{#raw}?more={encoded}"

or something like that even though it's not as self-explanatory.

We'd have to accompany this with some warnings to users that the raw mode
must be used carefully, e.g. with appropriate schema types restricting the
power for malformed URIs and the inability to generate server stubs in
XML-centric implementations.

All that smells a bit too much like new features at the last minute to me!
And it only gets us part way, as it doesn't solve the problem of creating
non-reversible templates like {x}{y}.  That one's much harder.  Simply
preventing adjoining templates won't work - how does one deconstruct
{first}-{last} if first='Jean-Jacques' and last='Moreau'?  There needs to be
a delimiter between each template that cannot appear in the data.

One can get pretty fancy and context-sensitive in figuring out which
characters to appear, but a lowest common denominator of approach seems
workable and allows any data to be encoded without harm and a representing
IMO a loss of functionality lower than the potential for simple mistakes:

1) %-encoding each character in the XML except a-z, A-Z, 0-9, "-", ".", "_",
"~".  Per RFC3986 sec 2.4 this escaping is performed prior to insertion into
the URL in place of the template. 
2) Force templates to be separated by a character sequence containing at
least one unescaped character not in the above set.  A BNF for this seems
possible though I failed in my simple attempt to create it...

Thus these would be disallowed:
  {foo}{bar}
  {foo}-{bar}
  {foo}%20{bar}

And these would be allowed:
  {foo}.xml
  {foo}/{bar}
  {foo}?{bar}
  /{foo}+{bar}/baz
  ?{foo},{bar}
  ?{foo}={bar}
  ?foo={foo}&bar={bar}
  ?foo={foo}-and-then-some&bar=more-than-{bar}

Jonathan Marsh - http://www.wso2.com - http://auburnmarshes.spaces.live.com
 

> -----Original Message-----
> From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On
> Behalf Of Youenn Fablet
> Sent: Friday, January 05, 2007 3:11 PM
> To: www-ws-desc
> Subject: Clarifications on CR117
> 
> 
> After yesterday's discussion about CR117, I have the following
> comments/precisions/questions.
> I hope this helps clarifying the issue(s).
> 
> 1) Question mark
> a) Having a '?' in the values of the parameter may lead to issues: the
> query string may begin in advance:
> examples:
>     whttp:location="Send/{title}/index?" with two parameters (title and
> author) may lead to something like: Send/What?/index?author="unknown".
> There might be applications that will be able to handle that but others
> may not be able to correctly handle this...
> 
> b) To be noted that client applications will need to check at runtime
> whether the location and parameter values have a '?' in order to
> correctly build the query string
> Let's have whttp:location="/Send/{title}"
> if title is "What" and author is "Unknown&Co", we would have:
>     /Send/What?author=Unknown&Co
> if title is "What?ok" and author is "Unknown&Co", we would have:
>     /Send/What?ok&author=Unknown&Co
> This might need to be clarified in the specification (cf. phillipe AI).
> 
> Please note also the use of the "&" in this example. Other reserved
> characters (#) may also have some impact. Hence the proposal at the end
> of this message.
> 
> 2) URI escaping
> 
> Characters from @address, @whttp:location or from parameter values may
> need to be escaped before being put in the HTTP request.
> Characters from @address and @whttp:location are escaped as there is a
> mapping defined by their type xs:anyURI.
> What should be done with characters from parameter values is not
> specified IIRC.
> We might want to clarify whether the escaping happens before or after
> the replacement of the parameter name by its value.
> If we have @whttp:location="Send%{int}" and int is "20", what do we have
> is either Send%20 or Send%2520.
> Am I missing something?
> 
> 3) Reversibility
> 
> In some cases, the templating mechanism may be ambiguous.
> This may be due to the templates: whttp:location="{country}{zipcode}"
> may be ambiguous or not depending on the types of country and zipcode.
> This may also be due to the use of special characters within parameter
> values: whttp:lcation="" may be ambiguous if some parameter values use
> '&' for instance.
> 
> It makes perfect sense to allow the description of such non-reversible
> URI construction.
> It also makes sense IMHO to ensure the reversibility of the URI
> construction, especially for SOAP.
> While this is feasible to do it by constraining the type of the
> parameters as arthur suggested, I think it would be better to have a
> more lightweight and general solution for wsdl users : binding simple
> IRI-style-compliant structures to either SOAP-response or SOAP
> request-response would be quite useful.
> 
> One potential solution is to have two parameter value serialization modes:
>     - one straightforward that simply copies the parameter values
>     - another one that url encode all URL reserved/special characters
> (/,?,$,&,=,.)
> In the SOAP case, the second serialization mode might be the preferred
> one.
> We could then add within the WSDL component model a property that tells
> the WSDL processor how parameter values are handled for a particular
> binding component.
> The reversibility would then be ensured by the use of both the second
> serialization mode and simple templating rules like:
>     - have an empty location value: all parameter values are encoded as
> query parameters
>     - always put a '/' between parameter values
> 
> What do you think?
>     Youenn

Received on Monday, 22 January 2007 13:18:10 UTC