Re: Some protocol & service description issues from Eric Prud'hommeaux on 2005-02-15 (public-rdf-dawg@w3.org from January to March 2005)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 15 Feb 2005 05:15:19 -0500
To: Kendall Clark <kendall@monkeyfist.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <20050215101519.GR14150@w3.org>
This mail summarizes four options for getting a URL from a service URL
and a query. I can see three strong motivations for using a subset of
the sematics defined by HTML form submission combined with the
"application/x-www-form-urlencoded" mime type.
  1. use browsers and forms to generate valid queries.
  2. use existing code to construct and parse constructed URLs.
  3. very familiar to people.

I'm keen on the fourth alternative, but would like folks to consider
the downsides of it. I may have missed some.

On Tue, Jan 25, 2005 at 09:40:59AM -0500, Kendall Clark wrote:
> 
> Folks,
> 
> We kicked this issue around in Espoo, especially related to our
> service description language modeling efforts. I'm not sure I
> understand what the best opportunity for consensus is, but presently
> in the protocol document we've published, the design looks something
> like this (in the HTTP binding):
> 
> 1. queries are named by a "query" parameter
> 2. the type of query is named by a "query-lang" parameter, the value
>    of which is a URI that identifies the query language; there is no
>    list of such URIs nor short names in the document presently
> 3. if "query" is present, "query-lang" *must* be present too
> 
> One of the designs I can remember being proposed or discussed in Espoo
> was to "overload" (for lack of a better term) a single parameter, such
> that it conveyed the semantics of the present "query" and
> "query-lang". In other words, this design proposes a single parameter,
> the name of which indicates the query language type, and the value of
> which is (presumably) a legal sentence of that query language. For
> example, "sparql" or "rdql".


Base Case:
Given a service name
  <http://q.example/queries>
we can add
  "?q+" <form-url-encoded query>
to pass the query to the server.

Depending on how we define the that operation we have different
possibilities for using web forms. We can define the operation as

1. simple concatonation, regardless of the characters in the service name.
   ex "http://q.example/queris" + "?q=" + "SELECT%20%3"
   - only service SPARQL queries.
   + only requires strcat() and urlEncode().
   - service names with other parms can't be synthesised in HTML forms.
     ex "http://q.example/queris?a=b&c=d" + "?q=" + "SELECT%20%3"
     This is a legal with RFC3896 but feels a bit antisocial in the
     face of deployed CGI parsing code.

2. add a "lang" parameter and define the order of the parms.
     <service name> + "?q=" + <form-url-encoded query> + 
		"&lang=" + <form-url-encoded language>
   + services a larger set of languages.
   + only requires strcat() and urlEncode().
   - service names with other parms can't be synthesised in HTML forms.
     same problem as before

3. attempt to emulate HTML4.01 CGI behavior by looking back into the
   service URL.
   <service name> =~ m/[\?&][^\?&]*$/
   $1 eq '?'
     <service name> + "&q=" + <form-url-encoded query> + 
		"&lang=" + <form-url-encoded language>
   else
     <service name> + "?q=" + <form-url-encoded query> + 
		"&lang=" + <form-url-encoded language>
   + service name may include arbitrary parms, eg lang or inference
   - requires strcat(), urlEncode(), strrchr()
   - doesn't match behavoir of form actions containing a '?'
   - complicated

4. append the encoded query directoly
     <service name> + <form-url-encoded query>
   + dead simple
   + most flexible for service names with parms
   - only requires strcat() and urlEncode()
   - funny looking service names
     <http://q.example/queries?q=>
     <http://q.example/queries/owl-full/>
   - would still have to readdress if we added extra parms to the protocol.


RELEVANT SPECS:

RFC3896 Appendix A:
[[
URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
...
query         = *( pchar / "/" / "?" )
]]

URLENCODE <http://www.w3.org/TR/html4/interact/forms.html#form-content-type>
[[
application/x-www-form-urlencode

This is the default content type. Forms submitted with this content
type must be encoded as follows:

   1. Control names and values are escaped. Space characters are
replaced by `+', and then reserved characters are escaped as described
in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by
`%HH', a percent sign and two hexadecimal digits representing the
ASCII code of the character. Line breaks are represented as "CR LF"
pairs (i.e., `%0D%0A').

   2. The control names/values are listed in the order they appear in
the document. The name is separated from the value by `=' and
name/value pairs are separated from each other by `&'.
]]


HTML4.01 <http://www.w3.org/TR/html4/interact/forms.html#h-17.13.3.4>
[[
# If the method is "get" and the action is an HTTP URI, the user agent
takes the value of action, appends a `?' to it, then appends the form
data set, encoded using the "application/x-www-form-urlencoded"
content type. The user agent then traverses the link to this URI. In
this scenario, form data are restricted to ASCII codes.
]]


Surprising affect of the obove:

<form action="http://q.example/what?">
  <input type="hidden" name="inference" value="owl-lite"/>
  <input name="q" default="SELECT%20%3"/>
</form>

==> <http://q.example/what??inference=owl-lite&q=SELECT%20%3>

> Further, the identity of this string would be encoded in the service
> description and would have to be discovered (at least once) by
> requesters wishing to convey queries to that service; presumably,
> though we cannot enforce this, it would be a good practice for that
> parameter name to change as infrequently as possible. Further, again
> presumably, the parameter name could be cached by requesters, subject
> to the ordinary caching and resource-freshness issues vis-a-vis HTTP.
> 
> (Putting my WG member hat back on for a second, I'm not sure why
> discovering this *one* parameter name is good, but discovering other
> parameter names is somehow a bad design. Will have to chew on that one
> further, I guess.)
> 
> I believe the other designs discussed in Espoo are variations --
> different parameter names, as I recall -- of the design in the present
> protocol document.
> 
> Finally, various WG members took ACTIONs to work on (parts of) an RDF
> vocabulary for describing provisioning and service details of SPARQL
> query processors and queryable RDF graphs. I believe there was
> consensus that such a language should be included as part of the
> protocol document. 
> 
> Accordingly, and in order to provide a *bit* of coordination, I'm
> adding a section to my private draft of that document, which I'll
> update publicly by Friday, called "SPARQL Protocol Description Language"
> and using the acronym "SPDL" for that vocabulary. YMMV, of course, and
> I welcome other suggestions and feedback.
> 
> Best,
> Kendall Clark

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Tuesday, 15 February 2005 10:15:20 UTC