Rich Web Backplane XG

(Draft Final Report)

Charlie Wiecha (Chair), IBM
John Boyer, IBM
Jack Jansen, CWI
Steven Pemberton, CWI

{More here possible}

Introduction

The W3C Architecture allows for markup languages for different application areas such as graphics, multimedia, maths and hypertext to be defined separately, and then to be combined into applications using the facilities defined in those markup languages. Examples of use of this approach include XHTML+Math+SVG [ref], Joost [ref], and XSmiles [ref].

However, since these markup languages are defined to a degree in isolation from the other markup languages that they are used with, it can arise that some incompatibilities become apparent on combining them. Examples that have arisen include how events are used, accessibility concerns, and how data is submitted from forms.

The initial remit of the Backplane XG was to identify and explore these areas, and to investigate the possibility of defining central facilites that application markups could use and plug into, without having to redefine them.

During the period of its working the XG actually started implementing its ideas, and what evolved was a plan for an architecture of implementing XML-based architectures within current-day browsers, using Javascript as a definition language, the idea being if you could define a standard method of allowing the subparts to communicate, you could define both the syntax and the semantics of markup languages without those languages knowing about each other's existence, and they could be combined and work together without further intervention.

Rich user interaction platforms and patterns

Web users today seek ever increasing interactivity and responsiveness in Web applications, particularly as those applications expand in scope and function to provide increasing levels of capability ("richness') in presentation, control, and data management. The drive for richness stems from a desire for authors to address greater functional goals in their applications -- moving from web content to commerce to collaborative services. The means for achieving richness stem largely from migrating function back to the client that historically has been resident on the server -- we do not seek to re-establish a world of client-server computing but rather to increasingly leverage client capabilities in an era of greatly expanded distributed computing and online services.

The Rich Web Application Backplane Incubator Group (XG) has had as its goal understanding, demonstrating, and documenting two areas of work: (1) authoring patterns for increased client-side capabilities in not only rich presentation or graphics but also in navigation control and data capture and validation for high-function web applications, and (2) techniques for extending current browser runtimes to support these authoring patterns across a variety of markup formats beyond the varients of HTML supported "natively" by existing browsers.

In this XG report, we begin in the first section by illustrating the manner in which a variety of markup formats including XHTML, SVG, XForms, SMIL, and other emerging formats such as the Open Document Format (ODF) must come together in a coherent working whole.

In section two, we describe an approach to implementing many of these formats in current browser platforms. We have come to focus on this point significantly in the XG's work since we believe that in the evolution of the web today it is not sufficient only to propose improved formats without corresponding emphasis on pragmatic routes to adoption.

In section three, we illustrate a set of relevant authoring patterns for RIAs. In many cases the patterns we describe have been known for some time but have not reached critical mass in the developer community due to lack of convenient runtime support rather than lack of consensus as to their utility. We are excited at this point as to the potential for greater cross-format adoption of concepts such as MVC data binding, event-based coordination, implicit creation of the UI through repeats over queries on the active client data, and so on precisely since we now see and are actively involved in the creation of practical, performant, and deployable techniques for their use in current browser technologies.

Structure of a Rich Web Document

Figure X shows a cut-away view of the "stack" of a running document. The base browser provides core HTML and XHTML parsing capabilities along with required styling (CSS) and network support as usual. We argue above that richness of interaction results from increased client-side support for presentation, data, and control. Within the W3C, and generally in the industry, separate markup formats have been defined specializing in one or the other of these application concerns. Presentation is managed by HTML, SVG, VoiceXML, and so on. Data is addressed strongly by storage, calculation, and validation features of XForms. Control vocabularies include SMIL for synchronous multimedia and State Chart XML (SCXML) for state-machine based control.

As a result, RIAs will increasingly be authored as a composition of multiple markup formats as shown in the diagram -- not as separable "modalities" of a mashed-up application, but as a single coherent user experience. The question, then, is not how to compose separate application fragments as a compound document application but rather how to intermix patterns from multiple formats in a single application to result in a coherent user experience.

Approaches to language extensibility

In contrast to other approaches to browser extension we do not view these markups as authoring-time only formats but wish to support them directly in the browser, whether in the HTML or XHTML DOM. Markups for RIAs will in general be interactive formats, i.e. have processing models controlling their behavior in addition to content models defining their structure. When interacting with other components in a page, it will be important to preserve the semantics of these processing models -- a task which seems easier if the original document structure remains intact and close to the relevant processing logic. In the discussion below we consider various approaches to supporting both the structural and behavioral requirements of non-HTML markups in the browser.

It's also important to track emerging markup formats from sources other than the W3C and ask whether they might be of interest as components of web-based applications. Our example below shows one such format, from OASIS -- the ODT text format used in our case for reporting expenses for reimbursement. There are potentially other emerging formats, particularly in the space of industry vertical standards, which may similarly be of interest for use client-side with first-class implementation of their behaviors. Examples might include HL7 and Clinical Document Architecture (CDA) from the healthcare industry, ACORD from insurance, and XBRL from the finance industry. These formats are of potential interest not just as data-exchange formats but to the extent they have interactive behaviors also as components of client-side web applications.

Server-side transcoding

Historically, the preferred approach to supporting "foreign" markups has been to transcode them to HTML on the server and either associate event handlers either to implement markup semantics client side in script or to delegate user events to the server to update remotely the running page. While popular (having been used to implement XForms, ODF, and a number of specialized formats) this approach suffers from the obvious performance handicap of remote event processing and corresponding lack of scalability of server middleware.

A less obvious drawback, but perhaps more significant for RIAs, is that the client-side document structure resulting from transcoding bears little or no resemblance in general to the document as authored. Transcoding algorithms may be driven by convenience in terms of output markup placement, style, and naming conventions. For authors of other components on the page interested in mashing up with transcoded content these issues can pose significant barriers to knowing where to dispatch events and how to update dynamically page content. The principle of "view source" is important not just for offline designers but also for page execution at runtime.

Progressive enhancement and unobtrusive javascript, e.g. Dojo

Client-side pre-processors

Client-side transcoding will in general suffer the same issues surrounding transparent page content as server-based approaches. Client processors have the option, however, of more easily maintaining a relationship between the original input content and its transcoded result. Bi-directional mappings between source and target formats are needed to permit interactive editing or dynamic modification of the original content incrementally to update the running page.

A well-known example is dojo.E...

Client-based tab library frameworks

In practice many client-side transcoders do not preserve the original document markup nor link it to generated HTML content as required to support such an interactive relationship. A framework implementing taglibraries as "code behind" the HTML DOM, such as the Ample sdk [cite AmpleSDK] maintain the runtime identity of ...

XBL

Cross-platform client-side behaviors

The Ubiquity framework [cite Google code] achieves many of the objectives of XBL by attaching executable "behaviors" directly to the original source document elements as shown in figure X. The technique used for this behavioral "decoration" varies by browser but such differences are localized to page loading time and are not visible to page authors as they continue to see a common document structure consistent with its originally authored structure (view source is what is running not transcoded into HTML) and consistent behavior in all browsers supported by this library (currently Internet Explorer 7, FireFox 3, and the Safari/Chrome WebKit-based rendering engines).

Indeed, we see this approach to using Javascript as an important implementation technique for extensible markup elements whether they are XML or HTML-based. The examples we have given are from XML vocabularies (others may include State Chart XML -- SCXML, and ...) but a script-based "tag library" mechanism would be interesting for supporting incremental modules of HTML5, for example, as they appear in working drafts or perhaps separable Recommendations -- and also as a means to accelerate their implementation and adoption in a broad developer community before "native" implementations in each browser are fully available.

Standardization

Currently there is only one interoperable extension method available in mainstream browsers - using Javascript to add semantics to markup within a DOM tree. There are several methods for attaching the Javascript to the tree, for instance using XBL[ref], or directly adding it to the tree by hand, or indirectly via Javascript calls, however, once that is done, there is no further standard or agreement, except by particular libraries, on how to ensure that different parts of the Javascript work amicably together.

Since extension via Javascript is such an essential and widely-used technology, it seems a suitable area for standardization, especially for allowing independent technologies together in one page. Using the experience based on Ubiquity, a submission could be made to W3C as a basis for such standardization.

Application patterns supporting RIAs

In this section we present authoring patterns which are independent of the particular markup format in which they are used and which provide common approaches to data management, mapping of data to abstract and concrete presentation, and interaction control. These patterns form the essentials of what we have come to call the rich web "backplane" in that they provide a structural underpinning to a client-side web document independent of the particular content type being presented, facilitate the separation of concerns such as data vs. presentation vs. control, and hence provide a common skeleton -- aka backplane -- assisting in the composition of more complex web applications from a collaborating set of more primitive components.

While there are many widget or other component architectures emerging some of which [cite iWidget] also facilitate cross-component data sharing and communication, these approaches are coarser-grained units of composition than what we describe. The patterns which follow are embedded directly into the behavior of individual elements and groups of elements of their host languages and hence we view them as inherently integrated in those vocabularies rather than layered on top.

A simple example: Tracking expenses

Figure 1 is the main page of a simple expense tracking web application that we will use to illustrate authoring patterns and runtime integration of multiple markup formats. A list of expenses can be entered with a category, description, date, and amount for each. Expenses are stored in an XForms data model bound to this list.

Figure 2 shows a graphical view of expenses aggregated by category and drawn using an SVG piechart. The expense categories included in the piechart are determined dynamically by a data-driven aggregation of all non-zero categories from the underlying expense list as described and illustrated in markup in the next section. Figure 3 similarly shows a projection of expense entries by date again using SVG.

A set of expenses, once entered, may be submitted for re-imbursement by a report-generation tab implemented using the Open Document Format (ODF) text format as shown in Figure 4. Open Document text (ODT) elements are used to define a template for expense reimbursement with the actual expense document containing only those categories with non-zero expenses. ODF documents in the OASIS 1.0 specification may include XForms data models. As shown in Figure 4 we use this capability to bind the expense report directly to data in the underlying expense tracker's data model. Further, the table of contents entries are computed dynamically based again on those expense categories having non-zero entries and thus needing to appear in the running document. Of particular note in this example is this enablement of ODF as a web-centric markup format participating directly in the overall page lifecycle with other web formats such as XHTML, XForms, and SVG.

Transparant composition -- event based patterns

We begin with a simple observation as to the importance of transparent control within web pages. By "transparent" we mean the ability for one component in a page to observe, participate, and potentially alter the execution of another component in the page. This capability is desirable for authors adding function to an existing page, e.g. in "mashing up" content, in that it allows for their incremental content not only to provide additional presentation but also to augment the interactive behavior of the page in previously unanticipated (by the original author) ways.

The principle means for observing and augmenting control is through a consistent adoption of an event-based pattern of cross-component control rather than direct invocation of methods or procedures within a page. Direct invocation, while commonly used and simple to author, results in hidden paths of control which can not be observed or intercepted by code elsewhere on the page.

An event-based pattern, on the other hand, begins with a signaling phase in which external observers are notified as to the impending execution of the "default action" of the event. Handlers called at either capture or bubbling phases may inject additional logic to prepare for or as a consequence of the default action.

For those events which are cancelable, handlers may suppress the default action perhaps replacing it with logic of their own. A familiar example is the default action of selecting an anchor tag is to traverse the link. Canceling that event does not stop its propagation to notify other handlers, but does prevent traversal of the link at the end of the bubble phase.

Event-based XForms Trigger
...TBD...

One can think of this pattern as a partial "aspect oriented" programming within web pages where event propagation prior to the default action provides hooks for injecting code at various points in the event lifecycle. The potential utility of a secondary capture/bubble phase following default action execution (completing the analogy with AOP) probably lacks sufficient use cases to balance the increased expense of additional event propagation.

From a "backplane" perspective, the importance of this discussion is not due to the novelty of event-based patterns. Indeed, web authors are quite familiar with adding handlers to click, onload, and other HTML-related page events. Rather, since extensibility through the creation of new elements rather than script-based code has not been the norm to date on the web we don't see authors readily creating custom elements and hence extending their pages with custom events conforming to a transparent event-based pattern. Providing a well-defined means for declarative extension of web pages may provide the corresponding incentive to adopt more aspect-like patterns of composition as well.

Adapting event-based patterns to "non-DOM event-based" markup formats

{{TBD - SMIL adoption of DOM events}}

Patterns for client-side rich data management

Moving additional data and its associated calculation, transformation, and validation to the client is a key feature of rich web applications. Often, however, this aspect of interactivity is overlooked in comparison to the rich presentational impact of raster or vector graphics or video in adding dynamic behavior to client-side web documents. This section focuses therefore on various patterns which add "rich data" behavior to web applications in a manner we see as applying horizontally to an increasing number of web formats.

The XForms data model is well described elsewhere [cite XForms 1.1] and we do not repeat that discussion here. Rather, we focus on the emerging use of that format beyond conventional "forms" applications for data maangement in web applications and indeed beyond that context to formats such as Open Document not traditionally thought of as web document markups. Again from a "backplane" perspective, it is clear to us that the ability to store "instances" of data (whether XML or otherwise), to validate that data, compute over it to derive related data, and to associate metadata indicating the validity, relevance, and required states of that data are features of not only forms, but of rich web applications generally and also of rich document applications based for example on ODF [cite ODF].

Data transformations

A key feature of assisting user interaction with complex data is adapting the format and structure of that data for more convenient presentation and input. Often, data is stored in back-end systems in a structure convenient for database performance or perhaps to conform to standards defined in a given industry. Such data may be inconvenient for display or input by being decomposed into too many separate fields, or conversely by being aggregated into too few compact fields (think ISO date-time for example). Data extracted from back-end systems is typically encoded using internal key values which also require translation for external display and input.

End-to-end rich web applications thus typically contain a pipeline of data transformations between back-end and on-the-glass presentation. Figure x shows a set of transformations in the expense tracking scenario which maintain subtotals over each expense category. These constraints are expressed in a declarative form as a set of data-driven XPath expressions linking inputs in the instance values to computed outputs elsewhere in the data model. The model maintains a dependency graph of these constraints and re-evaluates them as necessary whenever input values change driving corresponding updates to other fields on the client.

XForms data transformationst
...example TBD...

The current capabilities of this calculation engine allow for scalar dependencies (i.e. individual element values) for inputs and outputs and do not support iteration. We can thus compute subtotals over known expense categories as in Figure x, but currently do not have a declarative notation for example to project expenses onto a set of dates where the date ranges and values are not known (or conveniently expressed) at authoring time. The graphical view of expenses by date in Figure x, therefore, is computed procedurally by iterating over the instance data and creating output instance data in the desired format for use by subsequent stages in the transformation or display pipeline.

Single node data binding

From a backplane perspective, we are interested in the ability to connect to the data model capabilities of maintaining instance values, computing derived values, and computing metadata (so-called Model-Item-Properties for validity, relevance, etc, see [XForms MIPs]) as a generic capability -- i.e. independent of any particular markup vocabulary for a user interface view, or indeed as a means to bind other elements of markup such as fragments of controller logic in SMIL (see section x, below). The xf:ref and xf:bind attributes define a lifecycle for model-view binding which can be applied not only to those elements defined in the XForms set of atomic and container level controls but indeed in any UI vocabulary needing data and MIP connectivity.

ODF is an example of a non-XForms specification that today uses this pattern of single-node-binding (we'll consider binding to sets of data in the next section). ODF allows authors to include an XForms model in text, spreadsheet, and presentation documents and to insert model data values into formatted content using a two-layer field and drawing architecture shown in Figure x. ODF fields are not XForms controls but they do bind to data using the xf:bind attribute and therefore obtain the value-change and MIP lifecycle behavior implied by that binding. We illustrate this capability in the expense tracker by including an implementation of a subset of ODF elements sufficient as a proof-of-concept of mixing ODF behavior in web documents.

Figure x shows the ODF binds selecting instance values in the data model. This, and all other ODF markup, was generated by use of production ODF editors such as Open Office and IBM Symphony and inserted into this web application with modification only to alter the placement of the data model in the root web page rather than in the ODF subtree. This extension does not alter the mechanism of data binding but is reflective of the use of ODF content in a mixed document rather than as a stand-alone office format.

ODF Form fields with xf:bind data binding
...ODF example...

ODF fields are abstract form fields which use xf:bind to connect to instance data but do not themselves draw that data directly in formatted content. ODF provides a separate draw:control element to manage that final layer of concrete presentation as described below in the discussion on mapping between layers of presentation. Thus ODF fields play a role similar to XForms abstract UI controls which are intended in general to be embedded in a host language for concrete styling and interaction control.

Container level controls and data binding

Single node data binding can be used as well with container-level controls, i.e. those that do not display or input data directly but have children which do. Container controls use xf:ref and xf:bind to set an evaluation context allowing for relative data binding expressions in their children but importantly also to receive MIP events for relevance, validation, and required status. This status is then inherited to child elements allowing, for example, for sections of ODF content in the expense reimbursement markup to display conditionally whenever the expense data in the category it is bound to is non-zero.

Conditional display of ODF content using xf:group
...ODF groups...

Nodeset binding

UI controls can bind to collections of data in addition to individual nodes using the "nodeset" level of data binding. Nodesets are constructed by XPath queries over instance data and the associated UI content is treated as a template to be repeated for each entry in the set. Relative binding (either nested nodesets or single node binding) applies within the set to continue the template expansion as many levels as required. The behavior of nodeset binding is particularly powerful as changes to the query are tracked dynamically and additional template content is instantiated, or existing content removed, accordingly to maintain a current relationship between data model and view. This mechanism provides an implicit, or data-driven, declaration of the structural relationship between model and view over and above the two-way data synchronization provided by single node bindings.

Continuing our use of ODF as an example host language for data-aware web applications, two of us (Boyer, Wiecha) have proposed elsewhere [ODFNext] extensions to the ODF forms vocabulary to support repeating content as well as static form fields. We see use cases both for xf:repeat as a container control around existing ODF field elements such as those shown in Figure x, as well as the extension of the ODF forms vocabulary to allow for XForms controls to be used directly in repeats or elsewhere in ODF forms. Similarly, we would be interested to explore xf:repeat as a data-driven UI template for other UI vocabularies such as SVG, VoiceXML, and potentially for generating more dynamic content for controller vocabularies such as SMIl and SCXML.

Patterns for mapping abstract to concrete presentation

While not required, typically, UI controls that bind to data are abstract controls in that they provide data connections, manage the behavior of their local interaction state (working data entry fields, selection status of single or multiple selection lists, etc), but do not present their bound data directly. Rather, some means of mapping between abstract controls and one or more concrete controls accomplishes this "last mile" to the user.

Custom controls from AJAX libraries

In Figure x, abstract UI controls are extended dynamically at page load time (or following element creation as the page is later extended incrementally during execution). The runtime structure of the xf:input element bound to a data value with type xsd:date is shown in Figure x. In this case, the YUI calendar widget is used to provide the concrete realization of an interaction technique appropriate for the bound data type, but the synchronization behavior with the backplane data model is abstracted away into the xf:input parent element.

SVG as a format for custom controls

A second example leverages the increasing availability of SVG as a native UI format in modern browsers. The piechart and expense charting shown above are drawn dynamically by javascript functions attached as event listeners to data model change notifications received by their controlling xf:repeat elements. Both SVG examples are cases where the concrete presentation subtrees are built as siblings of the abstract xf:repeat container element to avoid conflict with the existing behavior of xf:repeat as it manages the set of replicated data as its own children.

Linking ODF content to data

The ODF draw:control element is responsible for surfacing data model values into formatted content (whether text, spreadsheets, or presentations). Rather than achieving this embedding syntactically by nesting ODF fields directly in formatted content, the draw:control element is another example of sibling mapping resolved using cross-references by ID between abstract form fields and drawn content. The draw:control elements, like the SVG content in xf:repeat's, function as listeners to data model changes and redraw their content as necessary.

Extension with new controller formats

Along with the migration of function from server to client typical in a rich web application there comes a corresponding need for control over the resulting increased level of complexity of behavior. Rich web documents may have multiple interacting components on the same page, requiring coordination to achieve a coherent aggregate user experience. Rich web documents very often have asynchronous interactions with remote services, also requiring coordination to track requests and update client-side data or UI as responses are received.

Many AJAX-based applications today share this behavior but implement their controller logic directly in a scripting language such as Javascript. While perfectly functional, there may be categories of control logic that are particularly suited to special-purpose controller formats. Two of these are explored in this section and their implementations as modules leveraging the Ubiquity extension framework are detailed in the Appendix below.

SMIL

SMIL [cite SMIL30] is a format centered on time-based control abstractions useful as a stand-alone web format (i.e. as the root document type) as well as a controller embedded in other formats such as the XHTML/ODF compound document used in our expense tracker application. Time based control is particularly prevalent in multimedia presentations and demonstrations, for example in kiosks and online training.

The NYC tour in Figure x consists of a coordinated set of images, audio, and captions. Each set is played in parallel resulting in the UI as seen at one instant in. The overall control structure of this simple example is then a sequence of parallel sections, each containing audio, image, and caption. This markup for this example is executable directly in the Safari, Chrome, IE, and FireFox browsers using the proof-of-concept SMIL Ubiquity implementation described below in the Appendix.

More interactive uses of SMIL, as a wizard or mash-up controller, require additional capabilities to not only present content to the user but prompt for inputs and then drive the application conditionally as a result of those inputs. We anticipate the direct use of xf:dispatch, xf:send and other actions as SMIL extension elements to achieve this function.

SCXML

The State Chart XML (SCXML) format being defined by the Voice Browser Working Group [SCXML] is centered around reactive systems modeled conveniently by state machine-based semantics. Reactive systems are particularly prevalent in less modal UIs where multiple components, agents, or avatars interact concurrently and where cross-component coordination is required. While we have a small subset of SCXML implemented in Javascript we have not to date based this on the Ubiquity library nor explored its integration with scenarios such as the expense tracker.

Like SMIL, State Chart XML offers interesting possibilities as an embedded controller for rich web applications. Indeed, achieving coherent user experience in a web application assembled as a mash-up of multiple components is a challenge in that they need to share not only data (accomplished by binding to common data model elements as above) but control. When selecting a stock symbol and date range in an input dialog, for example, not only does the related stock widget need to accept those values but also to trigger the "Get Quote" operation as well. A controller such as SMIL or SCXML can conveniently add this control layer over and above the implicit data synchronization provided by shared model state.

Appendix: Implementation of formats used in the financial services application

XForms and XML Events

SMIL

Open Document Format Text (ODT)

Other possibilities: SCXML, Acord, HL7 -- many have behaviors not just data formats.

HTML and its extensions through HTML5/6/custom modules