Re: Some protocol & service description issues from Seaborne, Andy on 2005-02-15 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 15 Feb 2005 17:36:39 +0000
To: "Eric Prud'hommeaux" <eric@w3.org>
CC: Kendall Clark <kendall@monkeyfist.com>, public-rdf-dawg@w3.org
Message-ID: <42123327.1020402@hp.com>
Eric Prud'hommeaux wrote:
> This mail summarizes four options for getting a URL from a service URL
> and a query. I can see three strong motivations for using a subset of
> the sematics defined by HTML form submission combined with the
> "application/x-www-form-urlencoded" mime type.
>   1. use browsers and forms to generate valid queries.
>   2. use existing code to construct and parse constructed URLs.
>   3. very familiar to people.
> 
> I'm keen on the fourth alternative, but would like folks to consider
> the downsides of it. I may have missed some.
> 
> On Tue, Jan 25, 2005 at 09:40:59AM -0500, Kendall Clark wrote:
> 
>>Folks,
>>
>>We kicked this issue around in Espoo, especially related to our
>>service description language modeling efforts. I'm not sure I
>>understand what the best opportunity for consensus is, but presently
>>in the protocol document we've published, the design looks something
>>like this (in the HTTP binding):
>>
>>1. queries are named by a "query" parameter
>>2. the type of query is named by a "query-lang" parameter, the value
>>   of which is a URI that identifies the query language; there is no
>>   list of such URIs nor short names in the document presently
>>3. if "query" is present, "query-lang" *must* be present too
>>
>>One of the designs I can remember being proposed or discussed in Espoo
>>was to "overload" (for lack of a better term) a single parameter, such
>>that it conveyed the semantics of the present "query" and
>>"query-lang". In other words, this design proposes a single parameter,
>>the name of which indicates the query language type, and the value of
>>which is (presumably) a legal sentence of that query language. For
>>example, "sparql" or "rdql".
> 
> 
> 
> Base Case:
> Given a service name
>   <http://q.example/queries>
> we can add
>   "?q+" <form-url-encoded query>
> to pass the query to the server.
> 
> Depending on how we define the that operation we have different
> possibilities for using web forms. We can define the operation as
> 
> 1. simple concatonation, regardless of the characters in the service name.
>    ex "http://q.example/queris" + "?q=" + "SELECT%20%3"
>    - only service SPARQL queries.
>    + only requires strcat() and urlEncode().
>    - service names with other parms can't be synthesised in HTML forms.
>      ex "http://q.example/queris?a=b&c=d" + "?q=" + "SELECT%20%3"
>      This is a legal with RFC3896 but feels a bit antisocial in the
>      face of deployed CGI parsing code.
> 
> 2. add a "lang" parameter and define the order of the parms.
>      <service name> + "?q=" + <form-url-encoded query> + 
>   "&lang=" + <form-url-encoded language>
>    + services a larger set of languages.
>    + only requires strcat() and urlEncode().
>    - service names with other parms can't be synthesised in HTML forms.
>      same problem as before
> 
> 3. attempt to emulate HTML4.01 CGI behavior by looking back into the
>    service URL.
>    <service name> =~ m/[\?&][^\?&]*$/
>    $1 eq '?'
>      <service name> + "&q=" + <form-url-encoded query> + 
>   "&lang=" + <form-url-encoded language>
>    else
>      <service name> + "?q=" + <form-url-encoded query> + 
>   "&lang=" + <form-url-encoded language>
>    + service name may include arbitrary parms, eg lang or inference
>    - requires strcat(), urlEncode(), strrchr()
>    - doesn't match behavoir of form actions containing a '?'
>    - complicated
> 
> 4. append the encoded query directoly
>      <service name> + <form-url-encoded query>
>    + dead simple
>    + most flexible for service names with parms
>    - only requires strcat() and urlEncode()
>    - funny looking service names
>      <http://q.example/queries?q=>
>      <http://q.example/queries/owl-full/>
>    - would still have to readdress if we added extra parms to the protocol.

How about stepping up a level and not talking about the concrete construction of
URLs but talk about parameters and leave the construction to the "usual way of
doing it" (e.g.. forms.html#h-17.13.3) which might even leave a route to SOAP.

Service URL: http://q.example/queries
parameter="q" and suppose the service takes an optional parameter "style" for
adding a style sheet to XML results:

Constructed URL is http://q.example/queries?q=...&style=...

but if the service URL is http://q.example/queries&a=b the
constructed URL is http://q.example/queries?a=b&q=...&style=...

This seems to be the underlying form of (1), (2) and (3) and then it is case (4)
where the application sets the form parameters correctly.

If I read it correctly, this seems to me to be an application of
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.3.4
from step 2 with "control name" being "parameter".

Issues about requirements for  strcat(), urlEncode(), ... are then not part of
the external conditions but localised to the application or form.

 Andy


> 
> 
> RELEVANT SPECS:
> 
> RFC3896 Appendix A:
> [[
> URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
> ...
> query         = *( pchar / "/" / "?" )
> ]]
> 
> URLENCODE <http://www.w3.org/TR/html4/interact/forms.html#form-content-type>
> [[
> application/x-www-form-urlencode
> 
> This is the default content type. Forms submitted with this content
> type must be encoded as follows:
> 
>    1. Control names and values are escaped. Space characters are
> replaced by `+', and then reserved characters are escaped as described
> in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by
> `%HH', a percent sign and two hexadecimal digits representing the
> ASCII code of the character. Line breaks are represented as "CR LF"
> pairs (i.e., `%0D%0A').
> 
>    2. The control names/values are listed in the order they appear in
> the document. The name is separated from the value by `=' and
> name/value pairs are separated from each other by `&'.
> ]]
> 
> 
> HTML4.01 <http://www.w3.org/TR/html4/interact/forms.html#h-17.13.3.4>
> [[
> # If the method is "get" and the action is an HTTP URI, the user agent
> takes the value of action, appends a `?' to it, then appends the form
> data set, encoded using the "application/x-www-form-urlencoded"
> content type. The user agent then traverses the link to this URI. In
> this scenario, form data are restricted to ASCII codes.
> ]]
> 
> 
> Surprising affect of the obove:
> 
> <form action="http://q.example/what?">
>   <input type="hidden" name="inference" value="owl-lite"/>
>   <input name="q" default="SELECT%20%3"/>
> </form>
> 
> ==> <http://q.example/what??inference=owl-lite&q=SELECT%20%3>
> 
>>Further, the identity of this string would be encoded in the service
>>description and would have to be discovered (at least once) by
>>requesters wishing to convey queries to that service; presumably,
>>though we cannot enforce this, it would be a good practice for that
>>parameter name to change as infrequently as possible. Further, again
>>presumably, the parameter name could be cached by requesters, subject
>>to the ordinary caching and resource-freshness issues vis-a-vis HTTP.
>>
>>(Putting my WG member hat back on for a second, I'm not sure why
>>discovering this *one* parameter name is good, but discovering other
>>parameter names is somehow a bad design. Will have to chew on that one
>>further, I guess.)
>>
>>I believe the other designs discussed in Espoo are variations --
>>different parameter names, as I recall -- of the design in the present
>>protocol document.
>>
>>Finally, various WG members took ACTIONs to work on (parts of) an RDF
>>vocabulary for describing provisioning and service details of SPARQL
>>query processors and queryable RDF graphs. I believe there was
>>consensus that such a language should be included as part of the
>>protocol document. 
>>
>>Accordingly, and in order to provide a *bit* of coordination, I'm
>>adding a section to my private draft of that document, which I'll
>>update publicly by Friday, called "SPARQL Protocol Description Language"
>>and using the acronym "SPDL" for that vocabulary. YMMV, of course, and
>>I welcome other suggestions and feedback.
>>
>>Best,
>>Kendall Clark
> 
>
Received on Tuesday, 15 February 2005 17:39:39 UTC