Re: Client-side highlighting; tag proposal from Phillip M. Hallam-Baker on 1995-03-14 (www-html@w3.org from March 1995)

From: Phillip M. Hallam-Baker <hallam@dxal18.cern.ch>
Date: Tue, 14 Mar 1995 14:02:03 +0900
To: dsr@hplb.hpl.hp.com, www-html@www10.w3.org
Cc: hallam@dxal18.cern.ch
Message-Id: <95Mar14.140212+0900_met.63660-3+18@dxal18.cern.ch>
>As a result, I am now looking at a way of specifying both the start and
>ends of highlighted region separately from the document body, e.g. using
> single element in the document head, e.g. something like:
>
>       <highlight from=3096 until=4013>

I would like to second this proposal as being much more flexible all round.
In fact I would like to suggest that we have a completely separate annotations 
section this is because of the need to handle multiple annotations on the same 
document.

Let us consider scenarios:

1) Simple Annotation, a group of text is highlighted. Note that cut n' paste is
a special case of such annotation.

2) Group annotation, multiple users add multiple annotations to the same 
document. These annotations may overlap.

3) A user is editing a program code and running a compiler over it. The compiler 
spits out annotations on the source. It is MUCH easier to handle such 
annotations entirely separately becuase the compiler element that handles the 
error reporting probably has access only to a token stream, not the original 
text. In addition there is the intermediate edit problem where the user carries 
on editing.

4) A filter produces annotation on a document, eg converting text to hypertext. 
It is most convenient to do this in two stages, first building the annotations, 
then doing a merge.


There are two distinct types of annotation:

	Simple highlight
	Hypertext link

It is essential that hypertext links be allowed. This seems to point to using 
two tags eg, <ANN> and <ANNANCHOR HREF="", KEY= START=, LENGTH=>.

On the positioning problem there are two approaches, using the parse tree and 
using absolute byte offsets. I would propose we combine both. Clients should be 
able to handle a byte offset from within an element. This is mainly for ease of 
annotation building tools. Given a choice of complexity its best to load it onto 
browser writers than onto tool writers. This is because a browser is inevitably 
a large group effort wheras tool building should be feasible by `privateers'.

The normal method for specifying an annotation would be as a character offset 
from the character following the close angle bracket of a tag. Note that 
character does not imply byte since we have to consider UTF. The simplest 
convention would be to give an offset relative to the body. This allows 
annotations to be added into the head element thus allowing one pass parsers to 
work:-

START=/body/345

Does someone know the Hytime mechanism for this???

Support for fully implemented trees would be very usefull, ie to offset from the 
second level 2 heading within the third H1 :

START=/h1.3/h2.2/23

I prefer using LENGTH istead of END since its easier to calculate and shorter. 
It might be usefull to allow either END or LENGTH.

If no offset if given it should default to 0, If no  end point is defined (ie no 
length or end) it should default to the close tag of the structure defined in 
the start. This allows easy identification of sections.


The tree based annotation would be most usefull in collaborative work tool 
environments. I know we can't build these on HTTP/1.0 but I do not accept as an 
argument that we should only think about our current needs. The IETF standard 
process has a lead time of about two years. We will be needing the more 
sophisticated feature set long before we will get agreement on HTTP 3.0.

I don't think the programming demands would be too onerous. Basically its an 
addition into the FSR and tag translation components of the SGML module. Its not 
that hard a job to do both tree based and absolute offset based annotation.


We should also consider (yes there is more!) adding annotation TEXT into the 
body of a document. This could be displayed by callouts ie

<ann START="">This is annotation text</ANN>


And why not allow annotation on other documents? In Hyper-G annotation and 
documents are entirely separate. Why not have a model in which an annotated link 
may be made to another document? This is a very easy to implement and powerfull 
feature. Essentially it means that the page one travels from can annotate the 
next. The simplest use of this would be a a link to an annotated copy of a 
document, ie one clicks on the error log of a compilation and gets returned the 
source code annotated with errors. There are a wide range of other uses:

1) An annotated index to a Web is created. This has its own previous/next 
operations which may be very different to the previous/next operations of the 
documents themselves. Consider searching for the occurrences of "frying pan" in 
a large database. It turns up 60 odd refferences to hypertexts on the Web. It is 
helpfull for the index to be abole to annotate the location of the search item 
and also provide a previous/next facility. This cannot be stored in the 
documents themselves because they have no knowledge of being part a search 
operation for frying pans. 

2) Judge Lance Itoh has decided to go 100% electronic. He is reviewing his 
transcript of the O.J. Simpson trial which is being produced in real time. CNN 
wish to provide an annotated commentary of this transcript. There are two 
models, either the transcript and annotation are fed into a junction box and the 
result served or the browser independently collects both the transcript and 
annotation.

The second model is vastly more powerfull. It allows annotations to be performed 
in batch on realtime events. Consider that the annotations are issued once an 
hour. A reader does not want to have the annotated feed separate from the 
realtime feed. CNN do not want the hassle of providing a realtime server. They 
provide only annotation so that is what they want to distribute. 

In a charging model this is very important. The transcript feed might cost $10 
an hour while the opinions of CNN may be worth only a few cents. Alice, who is 
an OJ Simpson trial junkie subscribes to both the CNN and ABC annotation feeds 
but does not want to pay two lots of $10 for the transcript itself. This is much 
more important when one considers that Alice is also an IRC junkie and wants to 
sit on an IRC/WWW transcript annotation channel in addition.


Summary :- 
   * Need links and annotations
   * Start, end and length attributes, using tree structure of text with offsets
   * Normally stored inside the Head element.
   * May apply to documents referenced FROM a document.
   * Should consider extreemes of the model to get the right structure.
   * Easy to implement.

   * Someone should look at HyTime and see IF its usefull and grab the 
	good ideas.

		Phill
Received on Tuesday, 14 March 1995 11:59:05 UTC