Re: minutes from today's call

At 11:49 AM 2000-04-17 -0400, Wendy A Chisholm wrote:
>available at
>included below:
>17 April 2000 ER telecon
>LK since a bunch of new people and those with development experience, would 
>like to push the envelope. we have tended to discuss things straightforward 
>strategies. i'd like to see us put in a serious effort on complicated 
>things. this should attract developers. include: schemas for describing 
>accessibility, pattern recognition or AI to sense what's going on on the 
>page, tool to deduce the headers
>WL a "structuralizer"?

The basic capability is not that complicated.  

It is a structure type classifier.  

The main thing is that it has a major element of "best fit" reasoning in
its decisions, and it has a balance of bottom-up (depth first) and top-down
(breadth first) methods for deciding what roles things play in the page.  

We could get fancy with neural nets, but actually we could go a long way
with just a library of structure templates and heuristics about which
template-matching operations to try first.  Most pages have a head section
and a foot section, and a heuristic recognizer would rarely be wrong in
identifying their scope.  The inward-looking navigation bar would be harder
to isolate if it weren't always down the left margin...

This relates to the oft-repeated line "Where are the design-level
guidelines?" that we heard again at the CSUN F2F of the GL group.  One
version of a repair action is to add metadata indicating what type of
structure was found in the received page.  The User Agent behavior
associated with this is to navigate the page based on the inferred
structure determined by the analysis process.

In the talking book production process, structure recognition is done
manually.  The editor reviews the print edition and decides what parts of
the print version (they may be called chapters or cantos or whatever) plays
a role identified in the talking-book model of a securely navigable book
structure (top level subdivision within the body of the book, for example).
 This decision ultimately exists for all part-instances in the
book-instance.  It may be expressed as a rule "all chapters are nested
immediately within the bodymatter part" or it may just be done by
annotating all parts with this property.

We need to link our model of how a repair or author-coaching tool would
analyze page structure with the model and process of talking book
production.  These two activities are very similar in logical form, and
even in the climate of performance preferences that apply to the before and
after states of the document structure.


Received on Monday, 17 April 2000 15:47:31 UTC