- From: Phillip M. Hallam-Baker <hallam@dxal18.cern.ch>
- Date: Tue, 14 Mar 1995 14:02:03 +0900
- To: dsr@hplb.hpl.hp.com, www-html@www10.w3.org
- Cc: hallam@dxal18.cern.ch
>As a result, I am now looking at a way of specifying both the start and >ends of highlighted region separately from the document body, e.g. using > single element in the document head, e.g. something like: > > <highlight from=3096 until=4013> I would like to second this proposal as being much more flexible all round. In fact I would like to suggest that we have a completely separate annotations section this is because of the need to handle multiple annotations on the same document. Let us consider scenarios: 1) Simple Annotation, a group of text is highlighted. Note that cut n' paste is a special case of such annotation. 2) Group annotation, multiple users add multiple annotations to the same document. These annotations may overlap. 3) A user is editing a program code and running a compiler over it. The compiler spits out annotations on the source. It is MUCH easier to handle such annotations entirely separately becuase the compiler element that handles the error reporting probably has access only to a token stream, not the original text. In addition there is the intermediate edit problem where the user carries on editing. 4) A filter produces annotation on a document, eg converting text to hypertext. It is most convenient to do this in two stages, first building the annotations, then doing a merge. There are two distinct types of annotation: Simple highlight Hypertext link It is essential that hypertext links be allowed. This seems to point to using two tags eg, <ANN> and <ANNANCHOR HREF="", KEY= START=, LENGTH=>. On the positioning problem there are two approaches, using the parse tree and using absolute byte offsets. I would propose we combine both. Clients should be able to handle a byte offset from within an element. This is mainly for ease of annotation building tools. Given a choice of complexity its best to load it onto browser writers than onto tool writers. This is because a browser is inevitably a large group effort wheras tool building should be feasible by `privateers'. The normal method for specifying an annotation would be as a character offset from the character following the close angle bracket of a tag. Note that character does not imply byte since we have to consider UTF. The simplest convention would be to give an offset relative to the body. This allows annotations to be added into the head element thus allowing one pass parsers to work:- START=/body/345 Does someone know the Hytime mechanism for this??? Support for fully implemented trees would be very usefull, ie to offset from the second level 2 heading within the third H1 : START=/h1.3/h2.2/23 I prefer using LENGTH istead of END since its easier to calculate and shorter. It might be usefull to allow either END or LENGTH. If no offset if given it should default to 0, If no end point is defined (ie no length or end) it should default to the close tag of the structure defined in the start. This allows easy identification of sections. The tree based annotation would be most usefull in collaborative work tool environments. I know we can't build these on HTTP/1.0 but I do not accept as an argument that we should only think about our current needs. The IETF standard process has a lead time of about two years. We will be needing the more sophisticated feature set long before we will get agreement on HTTP 3.0. I don't think the programming demands would be too onerous. Basically its an addition into the FSR and tag translation components of the SGML module. Its not that hard a job to do both tree based and absolute offset based annotation. We should also consider (yes there is more!) adding annotation TEXT into the body of a document. This could be displayed by callouts ie <ann START="">This is annotation text</ANN> And why not allow annotation on other documents? In Hyper-G annotation and documents are entirely separate. Why not have a model in which an annotated link may be made to another document? This is a very easy to implement and powerfull feature. Essentially it means that the page one travels from can annotate the next. The simplest use of this would be a a link to an annotated copy of a document, ie one clicks on the error log of a compilation and gets returned the source code annotated with errors. There are a wide range of other uses: 1) An annotated index to a Web is created. This has its own previous/next operations which may be very different to the previous/next operations of the documents themselves. Consider searching for the occurrences of "frying pan" in a large database. It turns up 60 odd refferences to hypertexts on the Web. It is helpfull for the index to be abole to annotate the location of the search item and also provide a previous/next facility. This cannot be stored in the documents themselves because they have no knowledge of being part a search operation for frying pans. 2) Judge Lance Itoh has decided to go 100% electronic. He is reviewing his transcript of the O.J. Simpson trial which is being produced in real time. CNN wish to provide an annotated commentary of this transcript. There are two models, either the transcript and annotation are fed into a junction box and the result served or the browser independently collects both the transcript and annotation. The second model is vastly more powerfull. It allows annotations to be performed in batch on realtime events. Consider that the annotations are issued once an hour. A reader does not want to have the annotated feed separate from the realtime feed. CNN do not want the hassle of providing a realtime server. They provide only annotation so that is what they want to distribute. In a charging model this is very important. The transcript feed might cost $10 an hour while the opinions of CNN may be worth only a few cents. Alice, who is an OJ Simpson trial junkie subscribes to both the CNN and ABC annotation feeds but does not want to pay two lots of $10 for the transcript itself. This is much more important when one considers that Alice is also an IRC junkie and wants to sit on an IRC/WWW transcript annotation channel in addition. Summary :- * Need links and annotations * Start, end and length attributes, using tree structure of text with offsets * Normally stored inside the Head element. * May apply to documents referenced FROM a document. * Should consider extreemes of the model to get the right structure. * Easy to implement. * Someone should look at HyTime and see IF its usefull and grab the good ideas. Phill
Received on Tuesday, 14 March 1995 11:59:05 UTC