[Seventh Heaven] Need help describing APPLCORE

Any and all editorial thoughts would be appreciated -- but please
reply to me personally. It seems flabby to me, and I'm too bleary to
tell if it's too lame or too technical at times... RK

This column:  http://www.ics.uci.edu/~rohit/IEEE-L7-applcore.html
First column: http://www.ics.uci.edu/~rohit/IEEE-L7-1.html
General info: http://www.ics.uci.edu/~rohit/

Seventh Heaven appears in each issue of IEEE Internet Computing:
http://computer.org/internet

---
Rohit Khare -- UC Irvine -- 4K Associates -- +1-(626) 806-7574
http://www.ics.uci.edu/~rohit -- http://xent.ics.uci.edu/~FoRK

========================================================================

   Building the Perfect Beast
   
   Dreams of a Grand Unified Protocol
   
   Rohit Khare // March 5, 1999
   IEEE Internet Computing // Seventh Heaven
   
   Seventh Heaven was inaugurated one year ago to chart the evolution of
   Internet application protocols. Perched upon the heights of the
   network stack, we paged back through history to trace the lineage of
   Telnet, FTP, SMTP, NNTP, Gopher, and HTTP. With Jon Postel's
   fingerprints to guide us, the arc of thirty years of Transfer Protocol
   (TP) development brought us to the present day -- where the pages of
   our history books go blank.
   
   Leaving us, gentle reader, with but one option for our second season:
   to turn from documenting the past to divining the future. After all,
   unlike designs past, there are infinite number of designs yet-to-be:
   XML-RPC, HTTP-NG, DIAMETER, IMPP, SIP, ISEN, SWAP, WAP, SDMI, ...
   and every other acrynomious concoction we can get our hands on(*). And
   that's to say nothing of how protocols are actually adopted and
   history actually gets written: the mythical eighth (Economic) and
   ninth (Political) layers of the stack. These are the kinds of new
   topics we'll tackle in 1999: new protocols, economic models, and
   standardization processes.
   
   (*) [Footnote: For the record, that's Remote Procedure Calls in
   Extensible Markup Language, HTTP-Next Generation, an extensible
   successor to RADIUS (Remote Dial-In USer) authentication, Instant
   Messaging and Presence Protocol, Session Invitation Protocol,
   Internet-Scale Event Notification, Simple Workflow Access Protocol,
   Wireless Access Protocol, and the Secure Digital Music Initiative.]
   
   Johnny APPLCORE
   
   Conveniently, this raises a very timely topic to the head of our
   agenda: the IETF Applications Directorate's debate over the
   possibility and desirability of an Application Core Protocol
   (APPLCORE). This latest outbreak of an ancient meme bloomed rapidly,
   from its first mention in January to a Birds-of-a-Feather session at
   IETF-44 in Minneapolis by March. The proliferation of new applications
   derived -- or worse, cut-and-pasted by parts -- from SMTP and IMAP
   (Interactive Mail Access Protocol) has prompted an interest in
   documenting the common challenges of such extensions, and perhaps in
   developing an entirely new set of command building-blocks.
   
   Another variant of the same infectious hope motivates the HTTP-Next
   Generation project, sponsored by the World Wide Web Consortium with
   participation from Xerox and Digital (profiled in detail by Bill
   Janssen last issue). From their vantage point, the Grail looks like a
   programmable distributed object protocol, atop an efficient new
   multiplexing transport. They look ahead to interoperating with Java
   RMI, CORBA IIOP, and RPC protocols.
   
   And of course, HTTP/1.1 itself has proven flexible enough for a host
   of communities to (ab)use it to their own ends. From distributed
   authoring and versioning (WebDAV) to the Internet Printing Protocol
   (IPP), the freedom to define new methods, headers, and Internet Media
   Types at will has led to at least three schools of thought on
   extending the hypertext Web to other information transfer problems.
   Its proposed Mandatory extension mechanism aims to rationalize the
   patchwork of existing rules.
   
   These three camps -- Mail, Web, and Objects -- are each vying to
   establish a new order for Application Layer Protocol design. The quest
   for universal applicability, though, crosses several common fault
   lines in the design space. Each camp tends towards different tradeoffs
   between human- and machine-readability, command-oriented and stateless
   transactions, end-to-end and proxiable connections, inspectability and
   security, and whether to exploit the unique properties of UDP,
   multicast, broadcast, and anycast transmission.
   
   While it may seem quixotic to hope for a Grand Unified Protocol, part
   of the IETF ethos is that you never know unless you try. However slim
   the opportunity for clean reengineering, it is still appears
   worthwhile to each camp to seek out an "application-layer TCP," a
   second neck in the Internet hourglass.
   
   Or, as mail protocol and security maven Chris Newman put it when
   calling for APPLCORE in the first pace, "What protocol facilities are
   common to most of FTP, HTTP, IMAP, LDAP, NNTP, POP, SMTP/ESMTP, Telnet
   and our other successful protocols? ... I can't predict what a
   careful study of IETF protocol history and comparison of candidate
   solutions will suggest." Since that's the very mission statement of
   our column, let's take a shot at evaluating these camps'
   prospects...
   
   Diversity vs. the Melting Pot
   
   While "transport" protocol merely deliver piles of bits, TPs need to
   'tag-and-bag' their information, almost invariably using MIME syntax.
   IETF's eventual success in hammering out a standard for packaging
   information objects is a powerful inspiration for today's protocol
   unification.
   
   It's not much of a stretch to think of today's TPs as merely different
   strategies for delivering MIME objects -- real-time vs batch, push vs.
   pull, end-to-end vs. relayed, reliable vs. unreliable, and so on. Over
   the past year, this column has fleshed out a model of Transfer
   Protocols (TPs), as summarized in Table 1. We classify three aspects
   of TPs: addresses that identify participating nodes; distribution
   rules controlling transfers; and the message formats on the wire.
   
   Before proposing a universal solution by picking winners and losers,
   it behooves us to investigate the tensions and tradeoffs that
   motivated such diverse solutions in the first place.
   
   Engineering for Humans or Machines?
   
   While all of these protocols are executed by machines, they are
   created by programmers -- two actors with rather different priorities
   for standardization. Will the messages be binary (and faster for
   computers to parse) or English text (and easier for humans to debug
   and extend)? Will the application semantics be precisely defined, a la
   RPC, or open to extensible interpretation, like HTTP?
   
   Syntax. Two archetypal alternatives are RFC822 header fields and X
   Window System events. Email, news, and web headers from any
   information service are reasonably easy to understand, at the cost of
   marshaling complex data structures like dates and numbers as ASCII
   text and parsing all sorts of variations of case, line folding,
   comments, character sets and other gremlins to uphold the maxim "be
   liberal in what you accept". X Protocol messages are tightly packed
   binary data aligned on machine boundaries -- and can be processed
   several orders of magnitude more efficiently because the control flow
   through a server is clearly delineated by X Window System semantics
   and a strict extension mechanism.
   
   Furthermore, both alternatives are relatively brittle.
   Internationalizing header text in various human languages is
   notoriously difficult because of the number of techniques available
   (RFC 2277, "IETF Policy on Character Sets and Languages"). IETF also
   has some (exasperating) experience with ISO's Abstract Syntax Notation
   (ASN.1) self-describing binary marshaling format for SNMP and X.500
   public-key certificates. The tools for compiling such data structures
   are expensive and still can't guarantee interoperability amongst all
   the possible Encoding Rules (Basic, Defined, Text, ...).
   
   The silver bullet of the moment is XML. Roll-your-own tagsets offer
   some hope of jointly guaranteeing machine-readable validity and
   human-readable message text. As WebDAV encountered, using XML in a
   wire protocol has its own costs: complex machinery for XML text
   encoding ("entities"), whitespace, namespaces, and bloated message
   text. And for all that, there's still no standard data-formatting
   rules yet for basic data types like dates, numbers, arrays, and so on.
   
   The ability to simulate a protocol exchange with Telnet is another
   litmus test whether it's catering to machines or humans. If the
   application itself expects to be supported by individual users and
   system administrators, it's useful to experiment and debug the system
   with a bare-minimum tool available on almost any host. If it's a core
   OS service supported by the vendor, then it's reasonable to expect
   libraries for inspecting binary messages and APIs to support it.
   
   Semantics. Another way of justifying the "Telnet test" is to estimate
   how many independent implementations are expected. While there are
   only a few X servers and client libraries -- including definitive
   editions from the X Consortium -- there are uncountable little
   HTTP-driven scripts and hacked applications babbling away in pidgin
   dialects developers learned by imitating other clients and servers.
   The smaller the community of developers, the greater the likelihood of
   establishing a common ontology. A system as multilateral as the Web
   dilutes the semantics of "GET" to the point it could be opening a
   database, running a Turing-complete program, instructing a robot, or
   any other process that eventually generates a MIME entity.
   
   Turn the argument on its head, and it illuminates the tension between
   APIs and protocols. If there is a suite of clearly defined operations,
   there's more ease-of-reuse by standardizing programming interfaces
   ("the Microsoft way," Generic Security Services (GSSAPI, RFC 1508)).
   If implementations are expected to diverge, just focus on the
   bytes-on-the-wire and the message sequence ("the IETF way," Simple
   Authentication and Security Layer (SASL, RFC 2222)).
   
   It's the Latency, Stupid!
   
   Performance constraints also vary with human-driven or machine-driven
   transactions. Interactive use must minimize latency, leading to
   stateless protocols, while batched server-to-server communications can
   optimize bandwidth utilization with stateful command protocols.
   
   In the messaging arena, POP and IMAP provide client-server access,
   while SMTP and NNTP relay between stores. POP optimizes latency by
   selectively listing headers and bulk data separately; IMAP further
   offers concurrent operations. SMTP and NNTP, though, operate in modal
   lockstep, pipelining transmission and reception of new messages.
   
   On the Web, last issue's column suggested that the earliest HTTP/0.9
   spec was even less powerful than FTP -- but it had a critical
   advantage of lower latency. While FTP requires separate commands (and
   round-trips) to login, authenticate, navigate to a path, and request
   transmission, an HTTP download begins within one round-trip after
   connection establishment. That's because its request message
   encapsulates all those commands in a single message (URLs for path and
   filename; headers for authentication information and media-type) --
   and each request can be processed on its own, without reference to the
   state of a connection.
   
   As the 'bandwidth-delay product' gets larger and larger for wireless,
   fiber, and satellite links, stateless protocols become more valuable
   -- it takes less time to pickle the added state information and
   transmit it than to wait for a command to complete.
   
   Relay Races
   
   End-to-end support is another contentious design decision facing our
   conquering heroes. Stateful command sequences are harder to proxy
   through firewalls or cache, but can manage concurrency explicitly.
   Store-and-forward messaging (TPs) typically operate across a chain of
   relays, in contrast to interactive query and access protocols (APs)
   directly connecting clients and servers.
   
   Consider, then, the Calendaring and Scheduling WG's dilemma. Their
   requirements included connected and disconnected operation, queries
   against multiple stores, and low-bandwidth operation
   (draft-ietf-calsch-capreq-02.txt). There are some arguments for
   modeling a calendar as a Web resources, and operations upon it as HTTP
   transactions. In particular, the DAV and DAV Searching and Locating
   (DASL) extensions cover some of their requirement space. It's unclear,
   though, if the HTTP caching model is sufficient for nomadic use,
   though. An access protocol like IMAP has richer support for reflecting
   local actions immediately, chains of actions, and conflict resolution,
   but at the expense of custom development (about four years in this
   case).
   
   Concurrency is also represented differently in each school. An AP can
   tag its requests by ID and thus reshuffle its responses to several
   outstanding operations in the same session. APs like SMTP and NNTP can
   also "turn" the connection around to make requests in the opposite
   direction. Stateless TPs typically require synchronous response,
   tacitly pushing concurrency control -- the timing and priority of
   responses -- to the transport layer. Thus, Web browsers that open
   multiple TCP connections; and the Message Multiplexing (MEMUX) effort
   within HTTP-NG.
   
   Unlike end-to-end APs, proxying permits intermediaries to offer
   sophisticated services, from caching to Japanese translation. While
   HTTP cannot explicitly model the side-effects of a chain of operations
   (does the server reply to the outstanding GET or PUT first?), its
   statelessness does enable a rich caching model (I don't care as long
   as the reply's no older than five minutes).
   
   Checkpoint Charlie
   
   Firewalls are another kind of proxy. Network administrators have a
   right to inspect the contents of Internet connections. Today's
   baseline is filtering services by TCP port number, but as more and
   more applications attempt to extend HTTP or other APs, the port number
   isn't precise enough to enable or disable specific services. It's
   intellectually dishonest to use an existing, popular protocol as mere
   transport "because it gets through firewalls" -- they're there for a
   reason.
   
   For example, one totally flexible universal protocol is to map remote
   procedure calls to HTTP, whether in XML or directly coding, say
   Distributed COM (shipped with NT5, though thankfully off by default).
   If Web traffic could now conceal information leaking out or hackers
   coming in...
   
   Even multiplexing makes firewalls more resource-intensive. One TCP
   connection running MEMUX could have several subchannels, each of which
   must be judged individually.
   
   End-to-end encryption of an AP is another tough scenario: while a TP
   proxy might be able to decrypt and then forward an entire information
   object, it can be a violation of the standard to interpose a proxy in
   the middle of a stateful conversation.
   
   Security is more than transport-layer encryption alone. There are far
   too many application-layer Authentication and Authorization schemes.
   SASL succeeded by providing a simple building block -- a sequence of
   challenge-response messages and status codes -- that allow developers
   to mix-and-match authentication algorithms.
   
   Making the Medium the Message
   
   The layers below also offer application designers unique capabilities
   often overlooked. Telnet, for example, relied on TCP's urgent data and
   interrupt facilities, but HTTP serenely floats atop any 8-bit clean
   channel (even half-duplex!). Since there are so few applications that
   take advantage of broadcast, multicast, and anycast semantics, it's
   hard to plan ahead for a core protocol that could bridge those modes.
   And even though most services reserve TCP and UDP ports, datagrams are
   typically used for "small enough" messages -- it's the rare protocol
   that intelligently copes with lost packets.
   
   Link-layers also affect the evolution of application protocols. The
   Wireless Access Protocol suite is founded on a belief that every
   Internet layer must be reinvented for the cellular environment.
   Very-low-bandwidth and very-high-latency environments call for compact
   message encoding and pipelining, among other features.
   
   Camp the first: Mail
   
   The initial call for APPLCORE came from folks in the email and
   directory communities. A strawman proposal like Application Core
   Protocol (draft-earhart-acp-spec) can trace its heritage back to
   Postel's SMTP and FTP state machines and his theory of reply codes. It
   defines a framework for transitioning to an authenticated connection,
   issuing commands, and receiving interleaved, tagged responses a la
   IMAP.
   
   While it's easy to imagine, say, NNTP's current-group and
   current-article pointers and commands in this vein, the APPLCORE
   charter per se does not call for a stateful solution. It's just a
   coincidence that its author believes the proposed WG should " focus on
   a single "core protocol" based on the connection-based stateful
   client-server structure that most successful IETF application
   protocols follow."
   
   Camp the second: Web
   
   HTTP proponents would beg to differ at Chris Newman's imprecation that
   it's one of the "protocol models with which we have far less
   experience...research problems and thus out-of-scope." They can
   point at a handful of significant, standardized extension packages for
   HTTP/1.1 -- and a massive set of informal experiments. Its stateless,
   textual model is particularly hackable, even by shell scripts and
   programs without any internal model of the Web.
   
   HTTP/1.1 enables all this by permitting new methods, header fields,
   and content types. WebDAV, for example, modified the PUT method with
   lock fields; added MOVE and COPY (limited by a Depth: header); and
   manipulates metadata with XML-coded requests and responses.
   
   Complexity is the natural consequence of such freedom. Consider how
   byte-ranges, the ability to request only a portion of the expected
   response (typically, for partial rendering, e.g. a single page of a
   PDF file), interact with all other possible extensions.
   
   This community's best solution is the Mandatory extension mechanism
   (draft-frystyk-http-mandatory), which provides a namespace to discover
   more about an extension, and switches to indicate whether to succeed
   or fail if an extension isn't recognized. This permits multiple
   extensions to interoperate by marking off their own method names,
   header fields, and error codes. Extensions are identified by URIs and
   marked as Mandatory or Optional obligations on the next server (proxy)
   or only the origin server.
   
   The risk of continuing to subsume more application services within
   HTTP is the tendency to abuse it as an opaque container for their own
   traffic. One of its authors, Roy Fielding, declaimed: "The Web uses
   HTTP as a transfer protocol, not a transport protocol. HTTP includes
   application semantics and any application that conforms to those
   semantics while using HTTP is also using it as a transfer protocol.
   Those that don't are using it as a transport protocol, which is just a
   waste of bytes."
   
   Camp the third: Objects
   
   The third tack is to directly model Internet applications as
   distributed programs, and the client-server protocol follows as a
   consequence of the API. The IETF's standards-track Remote Procedure
   Call v2 specification (RFC 1831) was issued four years ago, based on
   fifteen years' experience at Sun and elsewhere. The popularity of
   CORBA and Java RMI interfaces underscores the power of this approach.
   
   So while the Web camp could be said to provide OO interfaces through
   document transfer (FORMs and TABLEs and so on), the HTTP-NG effort
   aims to provide document transfer over an OO interface. Its three
   layers begin with MEMUX to optimize transmission for the Web by
   composing compression, encryption, and other services for its virtual
   channels; then a messaging layer that can marshal binary request and
   response messages; and its own upper layer for services like The
   Classic Web Application (TCWA), WebDAV, printing, and so on.
   
   It's certainly appealing to reuse application protocols by subclassing
   and adding your unique functionality, then composing it with other
   off-the-shelf modules for security and performance. But Chris Newman
   invokes the conventional wisdom at the IETF: "RPC mechanisms are a
   poor choice in general for standards-based protocols. It's much harder
   to design an extensible and simple API than it is to design an
   extensible and simple wire protocol."
   
   Prospects
   
   So with each camp set against the others, how likely is any compromise
   core protocol -- or one camp's hegemony? The answers are not at layer
   7 alone: we must climb higher. Technical analysis alone does not point
   to a clear intersection of services, nor justifies any protocol's
   universal utility.
   
   Layer 8: Economics
   
   First, investigate whose scarce resources are being reallocated.
   "Reuse" claims litter the 120+ message archive of the debate
   (http://lists.w3.org/Archives/Public/ietf-discuss/) -- but whose
   efforts are being reused?
   
   There are at least three plausible actors: programmers, spec writers,
   and standards committees. The first group doesn't benefit until
   there's a single core protocol on the wire and a code library -- and
   then only for developing 'multi-protocol' tools. The second benefits
   by consulting the accumulated wisdom of APPLCORE to navigate the
   archives and cite prior art. The third benefits from rigorous modeling
   of the design alternatives and a framework to classify inter-extension
   dependencies in order to judge which protocols ought to be blessed.
   
   Now we can identify institutions that would be motivated to invest in
   this effort. Look for vendors who implement lots of protocols within a
   single product. Microsoft, for example, promotes a single "Internet
   Information Server" with the vision of vending the same information by
   HTTP, DAV, and other protocols. Netscape has a new app for every
   protocol. Apache, focused purely on HTTP, has little interest in
   implementing printing, conference calls, or other whiz-bang
   applications. In the second group are authors with experience from
   multiple working groups. Such IETF luminaries are leading each camp,
   respective examples such as Chris Newman, Henrik Frystyk Nielsen, and
   Mike Spreitzer.
   
   Layer 9: Politics
   
   To identify the third class of actors, though, requires understanding
   motivations at the highest level: Politics. The standardization
   process is an institutional struggle, in this case to establish order
   within the IETF Directorates, and to defend its turf from ISO and
   other bodies.
   
   APPLCORE intends to "go only in directions that at least two
   successful IETF protocols have gone," Newman claims. This extends the
   "rough consensus" maxim to bless reuse of solutions as guidance to
   spec developers. But to envision and APPLCORE robust enough to measure
   which protocols are worthy or not is another order entirely -- and one
   the IESG may not support.
   
   For example, SG member Brian Carpenter commented "that approach could
   imply that HTTPng is the basis for all future applications
   protocols...I doubt that the IETF is ready to make that step."
   There are similar interests against defining the Internet as mail-like
   or web-like.
   
   Instead, if no dominant camp emerges, a compromises which aims for
   their intersection raises the scepter of another bogeyman: ISO. IETF
   culture alternately derides for lowest-common-denominator designs with
   lots of options required for the useful cases; and for wasteful
   layering. Charges of ISOism have been leveled by all sides in this
   debate. Grand Unification Theories from the Object Management Group
   (OMG) don't go over well, either.
   
   All of these considerations feed into the decision whether IETF even
   charters a WG on this topic. Without viable candidate protocols, it
   smacks of design-by-committee -- which triggers quite an allergic
   reaction in this community.
   
   Implication
   
   But if -- just suppose -- APPLCORE succeeded, what a wonderful world
   it would be! An application designer would only have to focus on
   understanding the underlying interaction pattern. Leave it to this bit
   of middleware to choose the right addressing system, syntax, and
   distribution algorithm in order to get the right bag of bits to the
   right people by the right time. Internet application design seem more
   like "rational drug discovery," adapting a protocol tuned to its
   requirements rather than merely chasing popular implementations and
   patching them to match.
   
   We expect new kinds of hybrid applications: Event notification with
   server-initiated delivery, not just polling. Smart pages that
   represent live processes. Multiprotocol message archives accessible
   over the web, on the phone, as a fax, by e-mail...
   
   Sure, it would only be a 'good enough,' but a single interface would
   be immensely valuable. TCP isn't ideally tuned for many applications,
   but its ubiquity has a quality all its own. Its twenty-year dominance
   has relied on technical ingenuity to manage its evolution and explicit
   political commitment. We have IP for packets, TCP for streams, and a
   clear vacancy for a message middleware standard.
   
   The Perfect Beast
   
   Realistically, though, we suspect this outbreak of the Perfect Beast
   meme will only take us as far as cataloging design patterns of
   Internet application layer protocols. Thirty years' experience is long
   enough to write a textbook, even if the ultimate solution isn't at
   hand. We can capture common solutions to common problems along with
   the rationale. Guidance like "measure compatibility with capability
   lists rather than version numbers" would greatly reduce design-time
   costs -- and guide repairs to today's protocols.
   
   It's instructive to reflect upon other outbreaks. Mightn't the search
   for One True Application Protocol lead to the same cul-de-sac as the
   One True Programming Language? That community has also played a game
   of turtles: whether functional, imperative, or procedural ought to be
   the most fundamental representation. LISP has been the language of the
   future since 1959 -- even as it has grown to accommodate procedures
   (Scheme), objects (CLOS), and a vast array of time-tested tools (the
   Common Lisp environment). The risks of sprawling unification were
   incoherence, poor performance, and refragmentation into sublanguages.
   
   There is a fundamental phenomenon at work: ontology recapitulates
   community. Ask how many people need to understand a given sentence?
   -- whether source code or protocol message. That's how universal
   the solution needs to be. If we want every Internet-accessible
   resources in a global hypermedia system, you can't expect more
   than a snapshot representation; if you expect to manipulate a datebook
   to automatically schedule a meeting, then only special-purpose
   calendaring tools will be cognizant of other conflicts.
   
   All we can hope for is to make the common case easy and keep the hard
   stuff possible...
   
   
    Rohit Khare is a graduate student in the WebSoft group at the
    University of California, Irvine; and a principal of 4K Associates,
    a standards strategy practice.
    
   
   
   [Word count:~4,000..Note to the editors: feel free to elide or
   substitute the intra-document headings Note: Table 1 can be copied
   directly from the March/April 1998 issue's Table 1, with a suitable
   change of tense in the caption.]
   
   
<PRE>   
     CAPTION: Three key aspects of the Transfer Protocols (TPs) Seventh
                           Heaven covered in 1998
                                      
Transfer Protocol	Addressing	Distribution	Content
- ------------------------------------------------------------------------

Terminal Telnet	Host	Port	1-1	Sync	Both	Bytestream w/interrupts
Files	FTP	Host	Path	1-1	Sync	Both	Text / Binary Files
E-mail	SMTP	Mailbox	Msgid	1-N	Async	Push	822 + MIME
Usenet	NNTP	Newsgroup	Msgid	N-N	Async	Push	822 + MIME
Web	HTTP	Host	URL Path	1-N	Sync	Pull	822 + MIME + 
                                                                HTTP caching

</PRE>

- ------- End of Failed Message

------- End of Forwarded Message

Received on Tuesday, 9 March 1999 00:15:34 UTC