Re: Radical cure for BOS confusion (2 CCs deleted). from David G. Durand on 1997-01-11 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Sat, 11 Jan 1997 15:03:05 -0500
To: w3c-sgml-wg@www10.w3.org
Cc: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130501aefd971f516b@[128.148.157.29]>
At 5:07 PM 1/10/97, Derek Denny-Brown wrote:
>That is not the issue I was worried/concerned about.  One reason contextual
>links (html's <A>, TEI <XREF> & <XPTR>) are the primary hyperlink style
>implimented in today's software is that it is very easy to figure out what
>the start for the hyperlink is, trivial even.  Using contextual links, you
>only need to traverse the document when the user explicitly asks for it, and
>thus a small delay is acceptable.  Using ilink style links means that the
>document load also incurs a similar penalty _for_every_ilink_ just to figure
>out what parts of the document are anchors.  The BOS defines which documents
>must be processed to ensure that all anchors have been located.  As an
>simple example, if you have three documents, A, B, C, where you are loading
>document A, but B and C are in A's BOS, then all three documents must be
>processed in order to determine what parts of A are anchors.  This is one of
>the reasons HyTime has not already replaced HTML.  As Steven Newcomb is fond
>of saying, how does an anchor "know" it is an anchor?  In order to answer
>that question, every thing that _might_ point at it must be resolved and
>_all_ ilinks must be processed to determine if _any_ use that object as an
>anchor.

All _relevant_ ilinks must be examined. This (currently undefined) notion
of relevance is the key to defining what we want. We can define the set of
relevant ilinks by definining a set of relevant documents, and then saying
that ilinks in that set of documents must be processed with respect to a
given starting point. We have several possible definitions of the set of
relevant documents (the XML BOS, as I think we might call it) on the table
at the moment:

   1. (Steve N.) The HyTime BOS. This means that all documents declared in
entity declarations are potentially relevant (though some control can be
explicitly provided to reduce this set). Indirection is supported via
entity declarations. The user must explicitly

   2. (Steve D., Jon Bosak) Some particular document mentioned as an
associate document. I'm not sure if more than one "link database document"
can be associated as relevant. Indirection in the creation of such sets is
not supported.

   3. (David D.) A particular set of documents declared by pulling in
"associated" documents explicitly requested by the user. (The user must
explicitly ask for a document to enter this set, but may use indirection to
manage sets if desired).

   4. (Martin Bryan). I'm least sure about what Martin wants but I think
he'd prefer a mechanism that enforced indirection in specifying the set of
relevant documents.

   5. (Derek). The "relevant document set" is always just the current
document. (no direction, or indirection).

>I like ilinks, but having worked on developing software to implement HyTime,
>they are not easy.  Carefull restrictions should be made to reduce overhead.

   As I said, ilinks are definitely harder than simple 1-way embedded
links! The way to make implementing ilinks easy is to simehow guarantee
that an application has parsed every ilink before it encounters any of that
ilink's endpoints. Then one need only keep a dictionary of ilinks sorted by
address, and track addresses as you parse, checking each new address in the
ilink database as you go.

   That said, I don't think this is actually a very good way to implement
ilinks. For one thing, the constraints on authors are hard to explain, and
for another, we will need to hard-wire the rules about when entity
references are resolved, in order to enable authors to tell if they are
making a forward reference or not.

   The other way is simply to parse all the relevant documents, saving the
ilinks. Then one can bounce through the list of ilinks, applying any
endpoints found in the relevant document set to the applications intenral
representation of the documents. I think this is, for a multiple document
model, the esentials of Lee's in-memory condition.

   Note that the latter strategy can combine with the first strategy -- you
need delay processing an ilink only until you parse the document it applies
to. For a browser, this means that incremental displays may not show all
links immediately, depending on when the ilink is actually processed. The
only way to avoid this is to somehow require that all ilinks be processed
first.

   Given the structural orientation of XML, and the kinds of location
address we are proposing, I think that it is not a hardship for
applications to have to keep a representation of their documents in memory.
So the second strategy makes the author's life much easier, and still has
an easy implementation strategy (process the ilink pool after bringing all
documents into memory), as well a nicer but harder one (combining
on-the-fly ilink resolution with a a final cleanup pass).

>I thin I just managed to say the exact same thing twice, so I'll try again,
>as a proposal:
>
>A XML Hyperlinking processor should ("is required"?) to notify the
>application of all anchors of hyperlinks only if the anchor and the
>hyperlink are declared in teh same XML document.  External resurces, as
>provided-by/restricted-by the BOS (and any other constraining mechanism we
>choose to define), may be used to locate the anchor.
>
>The problem with the above proposal is that it breaks the usefulness of
>ilinks for annotations.  I am not very happy about this, and would be happy
>to hear a better solution.

I think we need to allow the processing of multiple documents at a time. I
don't see that this is hard for any but the simplest of applications...
This kind of simplification make ilinks useful only for a few things. No
solution to the annotation problem is going to be based on single-document
parsing, and that could be one of the real selling points of XML (for
people other than Terry, who would rather not have this feature).

> I worry that a XML hyperlink processor will need
>to keep a complete representation of every document in the BOS around
>because any part of the BOS might use some other part as an anchor.

yes, that is why we need to be careful about defining the XML BOS. I think
we should have indirection, but that only an explicit request by the
author/publisher should add a document to the XML BOS.

> Another
>possibility is to define the BOS as a ordered list of documents such that if
>a hyperlink processor where to process the documents in order it would not
>need to report anchors (though it should report a path to the anchor) which
>are located in documents earlier in the ordered list of documents (which is
>the BOS).  Thus, once a document is processed, the hyperlink processor only
>needs to keep around some representation of locator/addressing elements
>which might be used by later documents.  Hyperlinks and anchors in the
>earlier documents will have already been passed to the application (or
>stored somehow) and are not neccessary for the hyperlink processor to
>process the later documents.

As I said above, I think this makes it too difficult to define of what
ilinks take effect, and also will require us to make a much more
constrained and "procedural" definition of XML parsing strategies in order
to support making the author's life more difficult.

I don't think keeping a data structure around and updating it later is an
insurmountably difficult problem. Model/View/Controller and other update
strategies have been around for decades, and are easier than background
formatting and processing of documents anyway.

>*David* since you were the one who responded to my previous post, does this
>at least make sense to you?  Have I explained why I want to restrict the
>power of ilinks?

   I understand, but I had (in my own mind) already discarded that kind of
strategy as too limiting, and not enough easier to make the limitations
palatable. So I understand, but I'm not convinced yet.

   -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Saturday, 11 January 1997 15:01:11 UTC