RE: FW: Textual Equivalents

>I'm not sure I caught all your comments, since I also used CAPS myself
>for most element/attrib names... Please tell me.

Yes, you ggt them all (ie, I did not say that much, and things fit in
multiple categories)

>> * IMAGE LOCATION (ie masthead usually centered at top)

>Can you give a little more detail on what you mean here ?

You introduced the idea of heuristics.(I thjink). I mean, as  a last resort,
use logic like:
1. if an image is small and is repeated on  multiple consecutive lines, it
probably is a bullet graphic
2. If there is a graphic in the first couple of lines that is centered,
there is a good chance it  is a masthead
3. Etc
What I'm trying to do here is probaly impossible. I'm trying to apply a
common syntax for web page struucture.

>> * REDUNDANCY OF IMAGE (both intrapage and interpage)

>Do you mean maintain some sort of technique cache, so that if some
>text version was extracted at some point for an element, there's no
>need to redo it for the same thing later on ?

No, this is more like an extension of my previous point, across multiple
pages. For example, if an image is repeated in the same relative location on
multtiple pages, it probably is a logo.

Hope this helps.

David Clark
CAST, Inc.

Received on Thursday, 10 September 1998 08:54:18 UTC