Re: Client-side highlighting; tag proposal

Joe English (jenglish@crl.com)
Mon, 13 Mar 1995 22:23:01 -0800


Message-Id: <199503140622.AA01929@mail.crl.com>
To: dsr@hplb.hpl.hp.com (Dave Raggett)
Cc: www-html@www19.w3.org (with any luck)
Subject: Re: Client-side highlighting; tag proposal 
In-Reply-To: <199503131709.AA214614550@dragget.hpl.hp.com> 
Date: Mon, 13 Mar 1995 22:23:01 -0800
From: Joe English <jenglish@crl.com>


Dave Raggett <dsr@hplb.hpl.hp.com> wrote:

> If you look at the current proposal for HTML 3.0 at:
> 
>         http://www.hpl.hp.co.uk/people/dsr/html3/CoverPage.html
> 
> you will see a MARK element matches your needs. Unfortunately, I have been
> strongly advised by SGML Open to avoid using paired empty elements:
> 
> "Many optimizations that prgrams can use because they know they are
>  dealing with trees are lost if such structures are permitted. For example, 
>  a program can no longer tellhow to format part of a document without going
>  all the way back to the beginning, on the off chance that there was a MARK
>  element a long way back.

This is not much of an issue for HTML documents on the Web,
since they tend to be small and are rendered as a single unit 
anyway.  It's not like a browser is going to display the book of 
Leviticus and have to worry about a marked region starting in Exodus 
and ending in Deuteronomy.

> Likewise, one cannot easily build a stack-based
>  formatter, e.g. that keys styles off the list of element types in one's
>  ancestry.

This is only partly true, and irrelevant besides.
If the browser is going to include this functionality -- 
highlighting regions that may cross element boundaries -- 
it can't use ancestor-driven style resolution in any case,
regardless of how the regions are identified.

As far as efficiency goes, the Tk text widget is quite efficient, 
and it doesn't use any hierarchical information at all;
all formatting attributes are specified with discontiguous,
potentially overlapping tagged regions.

And lastly, you *can* use a single-pass parser with a stack-based 
formatter to keep track of marked spans.

> An editor is in even worse shape. There is no way to validate
>  that such pairs even match, because "matching" is not a generic notion --
>  it has to be custom-built for each kind of pair.

Any SGML parser can do the ID/IDREF validation, and HyTime reftype 
constraints can do (most of) the rest, if it's that important.

Michael Johnson suggests a <SPOT> element which marks a
single point in the document, and using a separate <HIGHLIGHT> 
element with two IDREF attributes to identify spans.  This
might be even better, though search agents would have to buffer 
the entire document body before sending the head; this would not
be the case with <MARK>.

>  Many DTDs have inserted such element-pairs in their first drafts; they
>  end up removing them later, because they prove to be a pain for both
>  implementors and users, and to have surprising side-effects. We strongly
>  recommend avoiding them completely."

Guess I can't argue with that.

> As a result, I am now looking at a way of specifying both the start and
> ends of highlighted region separately from the document body, e.g. using
> a single element in the document head, e.g. something like:
> 
>         <highlight from=3096 until=4013>
> 
> Where the numbers are byte offsets into the document body.

But this doesn't solve the first problem -- highlighting
regions that cross element boundaries -- and it introduces
a new one:  Now the rendering agent has to keep track of 
byte offsets in the source document.

So does the agent which generates highlight information, for that
matter; this is a bigger problem since most SGML processors don't 
keep track of that information at all.  

Using HyTime or HyTime-like locators that point into the
document from outside is probably a good idea, but it
*still* wouldn't solve the first problem, and it is much, 
much more difficult to get right than implementing (and using) 
<MARK> would be.


--Joe English

  jenglish@crl.com

[ P.S. to list maintainers:
  Mail to www-html@www.w3.org keeps bouncing with this message:

    ----- Transcript of session follows -----
    Connected to www.w3.org:
    >>> HELO www12
    <<< 553 www12 host name configuration error
    554 <www-html@www.w3.org>... Service unavailable

  I tried twice this afternoon and once just now (~10PM PST),
  with the same result each time.  I've addressed this to 
  www19.w3.org in hopes that it will get through. 
]

[ P.P.S. to Dave:
  I'm Cc: ing you this time; if it bounces again I'm giving up 
  on www-html, but I still have hope for <MARK>.  Sorry if you
  end up with a bazillion copies.
]