Proposal for Improved/Simplified ViewsAndFormatting Segment *abstract* model

For previous discussion:

http://lists.w3.org/Archives/Public/www-dom/2002OctDec/0184.html

Our first goal has to be to generate interest in this specification, by
convincing others of it's importance and the ease of implementation and
use.  In my opinion, this module is not well understood by the majority
(see my initial experience in above thread link).  I think once it is
understood better, then many others will understand why I feel it is a
critical module.  In previous discussion linked above, Ray Whitmer (DOM
Chair) has also agreed to some degree of the potential importance.

Before we can do a better job of providing examples, convenience methods,
and otherwise explaining this module so a to generate necessary momentum,
then we first need to consider if we have the best abstract model to start
from.  After further analyzing the existing Segment model, I have some
observations and suggestions for generalization and simplication.

First of all, we need to understand what we are modeling and how we plan to
correspond this model to other DOM specifications for it to be useful.
Refer to the link above for previous verbose discussion.

My abstract understanding is that we are essentially modeling Segments
("chunks") of the View that have the different types that we choose to
expose.  Views will decide which of these exposed types they support, which
will depend on media type and other View specific factors.  Then we wish to
expose attributes of those Segments.  A prime example of a Segment type,
would be a character run.  But how should we group characters to make a run
for the Segment?  An obvious way is to group according to some shared
attribute(s), such as those which have the same font.  So far, it has been
proposed that these groupings be View specific, so the internal grouping of
a Segment type is not exposed.  Only the Segment's abstract type is
defined.  This makes some sense because for example, an aural media type
would not group according to same attributes as a visual media, e.g. font
has no meaning in aural space.

ELIMINATE MATCH
===============

Then the question is how to expose the Segments so as to correlate them
with something useful?  The existing model provides for limiting the
Segments to those that match Items, which can be ContentItems (e.g. DOM
Nodes) or more generic Items, such as StringItem.  The existing model
proposes a Match interface for polling the types *AND* the attributes based
on some predefined logical comparisons.

It is my opinion and observation that the Match interface reduces the
generality and provides an unnecessary implementation burden.  It reduces
the generality because only predefined logical comparisons of predefined
attributes and types can be used to search.  I would prefer to expose a
more general model that let's other build their own match-like algorithms
on top of our layer.  This will also remove the burden of implementing a
Match interface at this layer.  This suggestion furthers the goal of OO
thru layer modularization.

Instead I propose that only Item criterion be used to return Segments.
This also has the benefit of eliminating a complex portion of the
specification, so that better focus (in examples, specification
finalization, etc) can be applied to how we do Item criterion.  By reducing
the focus of the specification, we can do a more focused job at getting the
specification completed correctly.

Perhaps an even bigger achievement of this suggestion is that we eliminate
the question of whether to expose string parameters and/or medium-specific
apis for the Match interface.  We simply eliminate that issue. :-)  One
*MAJOR* procedural disadvantage of string parameters is they do NOT get
syntax validated at interpreter (or compiler) time and instead not until
run time of *ALL* possible branches of a program.  This is unless the
language supports string enumerations as a language data type.  In other
words, string parameters based apis likely remove the power of syntax
checking from many language bindings.  Also I had already noted that string
enumeration queries (as a replacement for class attributes) obscure normal
string usage in programs, which is afaik probably why most (if not all)
languages probably do not have a string enumeration data type.

However, we must consider the potential tradeoffs.  One potential tradeoff
is whether returning so many Segments (less qualified results) places an
unreasonable resource burden.  I think this question is orthogonal to the
abstract design correctness question.  We can deal with resources usage by
for example, returning only the nth Segment and m Segments following at
once.  And/or we can consider whether the View-specific tree model of the
Segments should be exposed.


SEGMENT VIEW TYPE
=================

Also I observe that that the View's attributes could be returned as a
Segment of type View.  This would fold those attributes into the general
Segment model, which is preferable for reducing redundancy and interface
clutter.  More importantly it allows us to expose diffent View (media)
types as different Segment types.  Caller can poll for Segment types to
find out what kind of View it is.  For example, this allows a View to have
more than one media type simultaneously which is generalization over the
proposed model.  This is semantically consistent because a View contains it
own Segment(s) View types.


ITEM SETS
=========

We need to give more thought of general model for specifying Item sets for
Item matching criterion to ask whether the proposed model is sufficiently
robust and general.


-Shelby Moore

Received on Saturday, 21 December 2002 14:57:58 UTC