[LC304] Proposal for specifying serializations using two properties: serialization format, and content type

I took an action to start discussion about a proposal for specifying
serializations in the HTTP binding using 2 properties: the
content-type to use, and the serialization rules.

I am doing so in this message, and also proposing a simpler solution,
which has some limitations, but seems like the best way forward to me.

We currently define 3 serialization formats, that define both how the
instance data (a piece of XML) is going to be serialized in an HTTP
message, and what content type to use. This leads to some issues for

Sanjiva thought that it may makes sense to have 2 properties instead.
Here is my thinking how what this could look like, and my POV about
this approach.

For the HTTP binding, we need to define such rules for input, output,
and fault messages.

So we would be need 2*3 = 6 properties. The serialization format
properties would be responsible only for serializing the instance data
in the URI or the message body, and new content type properties would
define the value of the Content-Type header:
- {http input serialization} & {http input content type} for input
  messages
- {http output serialization} & {http output content type} for output
  messages
- {http fault serialization} & {http fault content type} for input
  messages

The content type value could allow for a choice between several values.

The 3 serialization formats we define, instead of being
application/x-www-form-urlencoded, application/xml, and
multipart/form-data, would be the following:
- http://www.w3.org/YYYY/MM/wsdl/http/urlencoded: serialization of the
  instance data in the request URI, or in the message body as a query
  parameter string (to be used with the
  application/x-www-form-urlencoded content type)
- http://www.w3.org/YYYY/MM/wsdl/http/xml: serialization of the
  instance data as XML in the body
- http://www.w3.org/YYYY/MM/wsdl/http/form-data: serialization as
  MIME multipart/form-data

For all of them, we would define the whttp:location micro-syntax, as
Charlton proposed for LC345.

Note that {http input serialization} will only be needed for HTTP
request methods with a message body, i.e. not for GET or DELETE.

The default for input would be:
- GET & DELETE: http://www.w3.org/YYYY/MM/wsdl/http/urlencoded for the
  serialization format
- POST & PUT: http://www.w3.org/YYYY/MM/wsdl/http/xml for the
  serialization format and application/xml for the content type

For output messages: http://www.w3.org/YYYY/MM/wsdl/http/xml for the
serialization format and application/xml for the content type.

Then, one could use those as follows:

  <binding type="…/http">
    <operation whttp:method="POST"
      whttp:inputSerialization="http://www.w3.org/YYYY/MM/wsdl/http/form-data"
      whttp:inputContentType="multipart/form-data">
      <input …>
      <output …>
    </operation>
  </binding>

As another example, for SPARQL Protocol, they could do:

  <binding name="queryHttp" interface="tns:SparqlQuery" 
     type="http://www.w3.org/2005/08/wsdl/http"
     whttp:version="1.1">

    <fault name="MalformedQuery" whttp:code="400"/>
    <fault name="QueryRequestRefused" whttp:code="500"/>

    <!-- the GET binding for query operation -->
    <operation ref="tns:query" whttp:method="GET"
               whttp:ouputContentType="application/xml, application/xml+rdf"/>

    <!-- the POST binding for query operation -->
    <operation ref="tns:query" whttp:method="POST" 
        whttp:inputContentType="application/x-www-form-urlencoded"
        whttp:inputSerialization="http://www.w3.org/YYYY/MM/wsdl/http/urlencoded"
               whttp:ouputContentType="application/xml, application/xml+rdf"/>

  </binding>

This is to be compared to the URI-based solution that I proposed at [1]:

  <binding name="queryHttp" interface="tns:SparqlQuery"·
            type="http://www.w3.org/2005/08/wsdl/http"
            whttp:version="1.1">

    <fault name="MalformedQuery" whttp:code="400"/>
    <fault name="QueryRequestRefused" whttp:code="500"/>

    <!-- the GET binding for query operation -->
    <operation ref="tns:query" whttp:method="GET"
               whttp:ouputSerialization="http://www.w3.org/YYYY/MM/wsdl/http/application/xml http://dawg.example/wsdl/http/application/xml+rdf"/>

    <!-- the POST binding for query operation -->
    <operation ref="tns:query" whttp:method="POST"·
               whttp:inputSerialization="http://www.w3.org/YYYY/MM/wsdl/http/application/x-www-form-urlencoded"
               whttp:ouputSerialization="http://www.w3.org/YYYY/MM/wsdl/http/application/xml http://dawg.example/wsdl/http/application/xml+rdf"/>

  </binding>

Basically, the URI-based solution forces the DAWG to define a
http://dawg.example/wsdl/http/application/xml+rdf serialization
format, for which they define that the instance data is serialized as
XML in the body, and that the content type is application/xml+rdf.

With the 2-property solution, the serialization format they're using
is http://www.w3.org/YYYY/MM/wsdl/http/xml that we would have already
defined, and they're just saying that the content type used is either
application/xml or application/xml+rdf. They don't need to define any
WSDL extension in their spec to describe their service with WSDL,
which is nice, and probably more scalable.

The 2-property solution therefore is attractive, but I'm worried about
designing something new in a hurry. I think that it will force us to
do another LC, as it's a fairly important change, and we may have bad
surprises, discover some flaws, etc.

If we adopt the URI-based solution, we will need to check with the
DAWG in any case, as having to define the
http://dawg.example/wsdl/http/application/xml+rdf serialization format
is definitely something which may change their review. Which may mean
that we have to do another LC anyway, sigh.

So maybe there's a third option, as I'd really like to find a simple
fix.

As we are defining messages based on an Infoset data model, our
instance data is an XML blob.

We could keep the input serialization property to be what a list of
media type tokens, and define its content as indicating the content
type to use on the wire.

If the content type used is application/x-www-form-urlencoded, then
our rules for the application/x-www-form-urlencoded serialization
format are used.

If the content type used is multipart/form-data, then our rules for
the multipart/form-data serialization format are used.

If the content type used is application/xml, then our rules for the
application/xml serialization format are used.

If another content type is used, then we use the rules from the
application/xml serialization format, and set the content type to the
appropriate format.

So, taking SPARQL Protocol as an example again, we would have:

  <binding name="queryHttp" interface="tns:SparqlQuery"·
            type="http://www.w3.org/2005/08/wsdl/http"
            whttp:version="1.1">

    <fault name="MalformedQuery" whttp:code="400"/>
    <fault name="QueryRequestRefused" whttp:code="500"/>

    <!-- the GET binding for query operation -->
    <operation ref="tns:query" whttp:method="GET"
               whttp:ouputSerialization="application/xml, application/xml+rdf"/>

    <!-- the POST binding for query operation -->
    <operation ref="tns:query" whttp:method="POST"·
               whttp:inputSerialization="application/x-www-form-urlencoded"
               whttp:ouputSerialization="application/xml, application/xml+rdf"/>

  </binding>

The limitation for this third solution would be that one cannot
serialize say image/png without a WSDL extension. Also, one could not
come up with say another application/x-www-form-urlencoded
serialization format without a WSDL extension or another HTTP binding.
But that would be the simplest fix I see, and I'm all for moving
forward with the smallest set of changes.

Comments?

Regards,

Hugo

  1. http://lists.w3.org/Archives/Public/www-ws-desc/2005Oct/0028.html
-- 
Hugo Haas - W3C
mailto:hugo@w3.org - http://www.w3.org/People/Hugo/

Received on Tuesday, 25 October 2005 15:57:31 UTC