W3C home > Mailing lists > Public > www-ws-desc@w3.org > January 2007

Re: Clarifications on CR117

From: Youenn Fablet <youenn.fablet@crf.canon.fr>
Date: Mon, 22 Jan 2007 16:10:50 +0100
To: Jonathan Marsh <jonathan@wso2.com>
Cc: "'www-ws-desc'" <www-ws-desc@w3.org>
Message-id: <45B4D3FA.3000302@crf.canon.fr>

Jonathan Marsh wrote:
> I'm returning to this topic based on my AI to look at CR117 further.  Youenn
> does a good job below of pointing out more of the potential issues.  IMO
> this can be boiled down to two questions:
> 1) Do we allow the user the power to create URLs from data that either
> result in malformed URIs, non-reversable data, or both?
> 2) If we do, can we advise the user, or the WSDL processor, on how to bind
> the data safely?
> Youenn's suggestion below of providing a "safe mode" in which %-encoding is
> applied to the data before inserting it into a template is interesting.  I
> can imagine this being exposed as a feature of the templating language
> directly:
> 	whttp:location="{raw}?more={%encoded}"
> Where the % directs the WSDL processor to encode the data (otherwise it's
> stuffed in raw).  Actually, the reverse would probably be better - encode
> unless the user makes an effort to ask for the raw mode:
> 	whttp:location="{#raw}?more={encoded}"
> or something like that even though it's not as self-explanatory.
> We'd have to accompany this with some warnings to users that the raw mode
> must be used carefully, e.g. with appropriate schema types restricting the
> power for malformed URIs and the inability to generate server stubs in
> XML-centric implementations.
IMHO, the raw mode is a powerful feature that should be kept to advanced 
Having a simpler encoded mode makes a lot of sense to me, especially if 
it makes SOAP-Response usable.

> All that smells a bit too much like new features at the last minute to me!

It might be new features, although if we take the 80/20 bar, the encoded 
mode might be more appealing than the raw mode,
at least in the SOAP world and even for simple HTTP services.

> And it only gets us part way, as it doesn't solve the problem of creating
> non-reversible templates like {x}{y}.  That one's much harder.  Simply
> preventing adjoining templates won't work - how does one deconstruct
> {first}-{last} if first='Jean-Jacques' and last='Moreau'?  There needs to be
> a delimiter between each template that cannot appear in the data.
Exactly, and if we encode data, these delimiters are easy to select and 
non-ambiguous locations are easy to assert.
> One can get pretty fancy and context-sensitive in figuring out which
> characters to appear, but a lowest common denominator of approach seems
> workable and allows any data to be encoded without harm and a representing
> IMO a loss of functionality lower than the potential for simple mistakes:
> 1) %-encoding each character in the XML except a-z, A-Z, 0-9, "-", ".", "_",
> "~".  Per RFC3986 sec 2.4 this escaping is performed prior to insertion into
> the URL in place of the template. 
> 2) Force templates to be separated by a character sequence containing at
> least one unescaped character not in the above set.  A BNF for this seems
> possible though I failed in my simple attempt to create it...
I would not go as far as disallowing these ambiguous templates.
Triggering a warning would be sufficient.

> Thus these would be disallowed:
>   {foo}{bar}
>   {foo}-{bar}
>   {foo}%20{bar}
> And these would be allowed:
>   {foo}.xml
>   {foo}/{bar}
>   {foo}?{bar}
>   /{foo}+{bar}/baz
>   ?{foo},{bar}
>   ?{foo}={bar}
>   ?foo={foo}&bar={bar}
>   ?foo={foo}-and-then-some&bar=more-than-{bar}
> Jonathan Marsh - http://www.wso2.com - http://auburnmarshes.spaces.live.com
>> -----Original Message-----
>> From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On
>> Behalf Of Youenn Fablet
>> Sent: Friday, January 05, 2007 3:11 PM
>> To: www-ws-desc
>> Subject: Clarifications on CR117
>> After yesterday's discussion about CR117, I have the following
>> comments/precisions/questions.
>> I hope this helps clarifying the issue(s).
>> 1) Question mark
>> a) Having a '?' in the values of the parameter may lead to issues: the
>> query string may begin in advance:
>> examples:
>>     whttp:location="Send/{title}/index?" with two parameters (title and
>> author) may lead to something like: Send/What?/index?author="unknown".
>> There might be applications that will be able to handle that but others
>> may not be able to correctly handle this...
>> b) To be noted that client applications will need to check at runtime
>> whether the location and parameter values have a '?' in order to
>> correctly build the query string
>> Let's have whttp:location="/Send/{title}"
>> if title is "What" and author is "Unknown&Co", we would have:
>>     /Send/What?author=Unknown&Co
>> if title is "What?ok" and author is "Unknown&Co", we would have:
>>     /Send/What?ok&author=Unknown&Co
>> This might need to be clarified in the specification (cf. phillipe AI).
>> Please note also the use of the "&" in this example. Other reserved
>> characters (#) may also have some impact. Hence the proposal at the end
>> of this message.
>> 2) URI escaping
>> Characters from @address, @whttp:location or from parameter values may
>> need to be escaped before being put in the HTTP request.
>> Characters from @address and @whttp:location are escaped as there is a
>> mapping defined by their type xs:anyURI.
>> What should be done with characters from parameter values is not
>> specified IIRC.
>> We might want to clarify whether the escaping happens before or after
>> the replacement of the parameter name by its value.
>> If we have @whttp:location="Send%{int}" and int is "20", what do we have
>> is either Send%20 or Send%2520.
>> Am I missing something?
>> 3) Reversibility
>> In some cases, the templating mechanism may be ambiguous.
>> This may be due to the templates: whttp:location="{country}{zipcode}"
>> may be ambiguous or not depending on the types of country and zipcode.
>> This may also be due to the use of special characters within parameter
>> values: whttp:lcation="" may be ambiguous if some parameter values use
>> '&' for instance.
>> It makes perfect sense to allow the description of such non-reversible
>> URI construction.
>> It also makes sense IMHO to ensure the reversibility of the URI
>> construction, especially for SOAP.
>> While this is feasible to do it by constraining the type of the
>> parameters as arthur suggested, I think it would be better to have a
>> more lightweight and general solution for wsdl users : binding simple
>> IRI-style-compliant structures to either SOAP-response or SOAP
>> request-response would be quite useful.
>> One potential solution is to have two parameter value serialization modes:
>>     - one straightforward that simply copies the parameter values
>>     - another one that url encode all URL reserved/special characters
>> (/,?,$,&,=,.)
>> In the SOAP case, the second serialization mode might be the preferred
>> one.
>> We could then add within the WSDL component model a property that tells
>> the WSDL processor how parameter values are handled for a particular
>> binding component.
>> The reversibility would then be ensured by the use of both the second
>> serialization mode and simple templating rules like:
>>     - have an empty location value: all parameter values are encoded as
>> query parameters
>>     - always put a '/' between parameter values
>> What do you think?
>>     Youenn
Received on Monday, 22 January 2007 15:11:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:07:05 UTC