Re: unmarked linkend awareness by XML engines from Martin Bryan on 1996-12-29 (w3c-sgml-wg@w3.org from December 1996)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Sun, 29 Dec 1996 09:50:57 +0000
To: dgd@cs.bu.edu (David G. Durand), w3c-sgml-wg@www10.w3.org
Message-Id: <1.5.4.32.19961229095057.0069e8a4@mail.u-net.com>
At 18:35 28/12/96 -0500, David G. Durand wrote:
>In HyTime, the BOS is always
>>application-definable, but it is also expressible interchangeably, as
>>a suggestion, in terms of an arbitrarily pruned entity tree.  If XML
>>supports ilinks, or n-directional linking, or, therefore, what I have
>>been imprecisely terming "anchor awareness," I think it's necessary to
>>have a way of expressing this pruned entity tree (BOS) notion in XML,
>>too.
>
>No, we simply have to face the fact that end users are the only ones who
>can decide what documents need to be in their processing set. We can't
>check the whole world, and we can't just leave it to the author (without
>damaging the ability to create external annotations), so we have to leave
>it to the user (via their application). In other words, XML as a standard
>cannot enforce this kind of scoping for the user. All that we need to
>specify is how ilinks behave in documents where they are _parsed_ by an XML
>processor. In HyTime terms, the BOS will always be the entire web (one of
>the reasons I always thought the notion was limited in usefulness).
>However, the set of documents an XML processor will be required to
>processes is an application (user) decision. As an example we could not
>force a browser to pre-fetch linked documents in case they might reference
>ilinks that _might_ be of interest.

This is where my call for thinking about link management comes in. Firstly
lets not get confused by this much misunderstood term "users", for which I
take Gavin to mean "reader". What is important is that authors, at the time
of creating links, obtain sufficient information to ensure that a reader can
reconstitute the environment (BOS) the author was working in. To do this the
author must:

a) Create a formal entity definition for each document referenced. This
entity declaration must contain the fullest possible information about the
referenced document. In particular it must record what the HyTime extensions
to SGML refer to as the SOIBase - a clear definition of which machine the
file was originally obtained from

b) Create an independently referencable set of  links that use the recorded
entity declarations as their BOS.

The main point is that the entity definitions and the link information need
to be stored as a separately referencable document/fragment that can be
extracted for evaluation outside of the accompanying text so that link
validation can be done efficiently without having to scan through vasts
amounts of data. If the link information is stored externally from the data
(which it ideally should) there should be a way of pointing to the relevant
file(s) from the XML document element, e.g. <XML -XML-links="<url
SOIBase='http://sgml.u-net.com/'>mylinks1.xml mylinks2.xml</url>">

Unless we get the idea of _formally_ recording the sources of our links as
part of the authoring process we will get hung up on the red-herring of the
Web being the BOS - it isn't. The BOS is defined by the set of files
identified by each file visited. While in total this could be as wide as the
Web if we tried to visit too many files at once we, in practice, only need
to deal with that part of the Web associated with the files currently being
viewed. Providing we can obtain this information easily the process of
managing it will be easy.

Martin Bryan
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.u-net.com/~sgml/
Received on Sunday, 29 December 1996 04:52:27 UTC