XML and encoding (was Re: summary table of WWW9 agenda proposals)

"Eric Prud'hommeaux" <eric@w3.org> writes:

> I put together a table ('cause that's my way of taking notes) on the
> different things people wanted to talk about in Amsterdam.

> <http://www.w3.org/2000/04/18-WWW9-XML-protocol-agendae>

Eric, thanks for summarizing.  I'd missed Larry's issues.

In <http://lists.w3.org/Archives/Public/xml-dist-app/2000Apr/0012.html>,
Larry Masinter <LM@att.com> writes:
> What problem(s) is XML solving as the protocol element for
> carrier of distributed application semantics? How well does
> it solve them?
> 
> For example, does XML give you simplicity, extensibility,
> integration with other software components, ease of
> implementation or debugging?
> 
> What are the alternatives to XML and why is XML better?

The benefit generally ascribed to XML over other _plain_text_ formats
is the infrastructure (network effect) built up around it as a
standard on-the-wire syntax.  The "standard" plain-text format could
have been any number of contenders (Lisp/Scheme s-expressions being
one of the most commonly proposed), but XML became popular due mostly
to it's similarity to HTML and it's relationship to SGML.

There are several proposals for "binary XML".  Most binary XML is just
a re-encoding of plain text XML using a binary or more concise syntax.

Another alternative is to take marshaling, encoding, or serialization
rules and produce a binary format specifically tailored to them.  The
binary format would not have all the generality of XML and would
possibly be much simpler.

To my knowledge, most traditinal binary alternatives have required
explicit and external interface definitions of the binary format, they
are not "self describing" in the same way that XML is.

As an example, LDO-Binary is a self-describing binary format developed
alongside LDO-XML specifically to share the same encoding rules but
not be simply "binary XML".

  <http://Casbah.org/Scarab/binary-serialization.html>
  <http://Casbah.org/Scarab/xml-serialization.html>

> Are there different ways of using XML in protocols (different
> encodings of data) and what are the pros and cons to be considered?

Yes!  Very much so.  I summarized three styles in
<http://lists.w3.org/Archives/Public/xml-dist-app/2000Apr/0004.html>,
they basically differ in the "rules" that define the grammar used in
the protocols, whether they are fully custom, automatically generated,
or fixed grammars.  ("Grammar" is used here to refer to any type of
DTD or schema, formal or informal.)

Fully custom grammars produce elegant, often concise, representations
of the application data to be transmitted.  They require hand-coded
conversions from XML into application objects or are accessed,
somewhat inefficiently, directly in a DOM tree.  Custom grammars
almost always include an explicit DTD or schema.

Automatically generated grammars follow rules for encoding data that
can be coded into de/serializers.  For most common data types they
come very close to the elegance and conciseness of custom grammars.
Certain data types (generic container classes, notably) are less
elegant or concise (than, say, as in a fixed grammar).  Generated
grammars often can be used without DTDs or schemas, or schemas can be
used to validate data at the XML level.

Fixed grammers use a limited set of XML element types to encode all
data.  These are generally the simplest to implement and the encoding
of complex types is straightforward (if supported).  Some fixed
grammars use element names that correspond to data types (and thus
have a fixed set of data types), others use generic element names for
basic data structures and then use attributes to declare a data type.
Fixed grammars generally require external techniques to validate data.

For automatic and fixed grammars it's generally very easy to store
unrecognized data types because there is an underlying basic data
structure that can be used.  For custom grammars, a DOM tree or
similar is used to hold unrecognized data structures.

There is another facet (nod to Eric): whether the protocol uses
implicit (or contextual) data typing or explicit data typing.
Implicit or contextual data typing does not carry detailed data types
in the serialization or an external schema.  Explicit data typing
carries data types either on every element or in an external schema or
IDL.  Different programming languages have different levels of support
for exchanging data types, some have and can provide data type
information for all objects, others don't have or can't.

Here's an updated breakdown of the protocols on
<http://www.w3.org/2000/03/29-XML-protocol-matrix> by protocol style
and data typing used in the serialization-only (body content) part of
the protocols:

  BizTalk            automatic   implicit (or explicit?)
  ICE                custom      implicit
  IOTP               custom      implicit
  Jabber             custom      implicit
  LDO                fixed       implicit or explicit
  LOTP               automatic   implicit or explicit
  SOAP               automatic   implicit or explicit
  Userland's XML-RPC fixed       explicit
  WDDX               fixed       explicit

  -- Ken

Received on Tuesday, 18 April 2000 12:05:19 UTC