Re: unmarked linkend awareness by XML engines from Martin Bryan on 1996-12-31 (w3c-sgml-wg@w3.org from December 1996)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Tue, 31 Dec 1996 11:19:01 +0000
To: "Steven R. Newcomb" <srn@techno.com>
Cc: w3c-sgml-wg@www10.w3.org
Message-Id: <1.5.4.32.19961231111901.006dc12c@mail.u-net.com>
>> Not unless you have proper versioning control and fully record the state of
>> the referenced file at the time you created the links. In practice what you
>> suggest however does not "limit the amount of information over which power
>> is exercised" but unnecessarily extends it to a level where it is
unmanageable.
>
>For the life of me, Martin, I can't see why you say this.  It's quite
>manageable.  I can only conclude that you don't understand what I'm
>trying to say, and/or that we have disturbingly different
>understandings about what a HyTime BOS is.

Both: Firstly you are misusing the term HyTime BOS - as your other message
shows. (I think of what you are talking about as a Session BOS - which I
equate with Browser History - which makes much more sense of what you were
saying.) A HyTime BOS is a BOUNDED object set created by recursively (to a
controllable level) identifying all entities referenced from a particular
hub document. As such it is simply a catalog of all entities referenced from
a particular start point. You cannot build such catalogs for Web pages.

What I thing you are talking about was a record of _all_ the files
referenced during a session: a sort of history with embedded catalogs. This
is what I was referring to as an Unbounded Web Object Set (UWOS).

I think what David was talking about, and what I concur with, is a
restricted list of files: those that _must_ be accessible to make sense of a
particular document.

Let me put this in context. For the last 2-3 weeks I've been struggling with
maintaining a set of web pages for the European Commission. I have something
like 20 possible hub documents, which cross refer to each other heavily, and
also point out to many other source documents on the web. I can maintain the
links between my own sets with difficulty. One problem I have is that I have
to keep reordering my data as I split it up into finer fragments in an
attempt to ensure that I do not cover more than 20 or 30 standards per
section (delivered document). For example, I recently split the multimedia
section so that video encoding standards were listed seperately from control
standard. Checking that all of my pages referring to the original composite
file now point to the correct part of the split file is pain enough. How do
I let others pointing to my pages know I have changed them?

The real problem I have is with maintaining links to other servers. For
example, I point to the EWOS server for the specifications they create. This
month they have reordered their site,  putting a message at the original
site to tell people that the site has been reordered but not taking them
there. Now all my links to EWOS are invalid. I need to change all the links
in the 20 odd files. How the hell do I do this easily? I want my locators
clearly separated from my hub documents so that I can manage them.

I don't want people to have to access through the site's home page (which is
a 11 language selector) but want them to be able to start from any 'chapter'
of the site and then access any other chapter the site may be connected to.
However, I want the site's copyright rules made available to everyone
accessing the site, no matter which point they start from, and I want people
to be able to access all the information about using the site from any start
point. (I do this easily by having standard headers and footers to my
documents that point to the relevant location. This is no big deal, but
needs to be borne in mind for XML - where an equivalent of the proposed HTML
Banner is required for electronic headers/footers.)

>> Not according to the model you give below - there is no guidance, simply a
>> mass copying of pointers found in related documents.
>
>What "mass copying of pointers"?  I have no idea what you are talking
>about.  There is nothing in my mind about copying anything at all.
>Please explain what you are thinking.

The recording of the BOS of preceding documents in a viewed link chain as
part of the document's inherited history.

>> We cannot expect link management to be
>> done by either an authoring system or an XML browser (which I presume is
>> what David is referencing when he talks about a user's application here).
>> Link management has to be a separate function, run at regular intervals
>> between link creation and link navigation.
>
>What do you mean by "link management"?  If it's link maintenance and
>periodic testing, then it's not what I have been talking about
>(although it's a good idea for anyone who owns links to dynamic
>documents).

As the above scenario shows, link management to me is the combination of
link testing and link address maintenance. I want this subject on the XML
agenda at an early stage

>> >According to HyTime, the BOS *suggestion* presumably takes effect when
>> >the user begins a session by entering a particular document.  Nothing
>> >requires that the user accept the document author's suggestion.
>> 
>> How an earth can a browser or its user determine which of the "BOS
>> suggestions" are relevant?
>
>If you're only in one document at a time, there's only one suggestion
>in effect at a time.

This presumes that each document has a BOS associated with it. I was
presuming, following the HyTime model, that a BOS would only be created for
the hub documents of a site.

>  If you accept the document's suggested BOS, then
>you are trusting the author of that document to take you where you
>want to go.  There's no "which".  There's only one "whether" per
>document, and therefore only one "whether" at a time.  What's the
>problem?

I presumed that if I went to a linked document I would inherit the BOS of
the document I linked from (otherwise how could I determine links in
read-only documents). If I followed a link from that read-only document to
another document I presumed I would have available to me the BOS suggestions
of the start document, the BOS suggestions of the read-only document and the
BOS suggestions of the new document. Is this not what you intended? Are you
now saying that only the latest set of BOS suggestions is in force? If so
how do you inherit links in read-only documents?

>> >Indeed, if the user insists on making the whole Web the BOS of his
>> >session, he can sit there and wait until his beard hits the floor.
>> 
>> How will he ever know that he was insisting on this? Surely he is not going
>> to be given a browser that has functions that say "follow all links in all
>> documents linked to this one"? Even crawlers run on supercomputers are not
>> stupid enough to adopt this approach: I don't see anyone realistically
>> suggesting that this approach would work on a 286!
>
>We can only assume that people who insist on things know what they're
>insisting on and are prepared to live with the consequences. 

I could not presume this for my readers - they could have no idea of the
complexity of my document's links. I would need to put a warning up to tell
them not to try to resolve all the links in the set as they would never
finish the task.

> On the
>other hand, if an author of a document provides a suggested BOS of
>such immensity that it causes users' computers to stop dead for hours
>at a time, then that's a pretty bad document and a pretty
>irresponsible author, or it's a document intended for the use of a
>pretty specialized audience.  Ordinary users would do well not to take
>that author's suggestions seriously.

The point is that what I want to do as an author is to specify a set of
"must know" documents and say that the other links are to be used on a need
to know basis.

>> In Web terms what the "UWOS" [Unbounded Web Object Set] needs to state is
>> the set of entities for which access _must_ be provided by the application
>> to make sense of the "document" being accessed by the browser via that
>> start-page. 
>
>Are you proposing a new term, "UWOS"?  What do you mean by this?
>In what sense is it "unbounded" if only certain entities need to
>be accessed?

See above. It is unbounded in the sense that there are no constraints on
where you have to look. You do not need to look for all links at all levels
up to a specific point. It is a set of objects in unbounded web space. (I
considered using Non-bounded, but could not pronoince NWOS!)

> The concept to "arbitrary limits on how deeply these entities
>> can recursively declare other entities" is both irrelevant and highly
>> dangerous on the web.
>
>Why is it irrelevant, and why is it dangerous to limit how deeply
>entities can recursively declare other entities?  On the contrary, I
>think it would be highly dangerous *not* to limit such recursion.

It is highly dangerourns if the requirement is, as in the HyTime BOS
definition, that you must identify all linked files to any level if no
constraints are applied. If, however, you are referring to just resolving
what you now call a Session BOS it is not dangerous.

>> >If, in the course of traversing the links in the session-start
>> >document, the user encounters another document, the user certainly has
>> >the option of accepting *that* document's suggestion as to what the
>> >BOS should be (making that document effectively the session-start
>> >document), or, of adding that document's BOS to his existing,
>> >session-dependent BOS, or of simply ignoring the suggestion and
>> >keeping the present BOS, or of doing something else entirely.
>> 
>> This will not work, for a number of reasons. 
>> 
>> Firstly look at the situation where I make a reference to just part of one
>> of your documents. The links in the part of the document that I refer to
>> _may_ be relevant to the BOS, but those that occur in the rest of your
>> document are unlikely to be relevant.
>
>So what?  If they do not have any linkends in the BOS, then they
>have no effect.

This is OK if they do not have any linkends in my BOS, but if they have
linkends in the referenced documents BOS and that is inherited then they are
likely to be irrelevant. Again we seem to be talking about different things!

>> (otherwise I would have referred to that part of the document as
>> well).
>
>There is no reason to make any such assumption.  Maybe the author of
>the annotation document got tired and went to bed.

He should not publish an incomplete review!!!

>> How can I determine which of the entries
>> in the referenced document's set would be relevant? If I copy them all then
>> my object set is no longer "bounded" - it is distinctly unbounded as far as
>> my document is concerned!
>
>There is no requirement that you add any traversed-to document's
>suggested BOS to your BOS.  You insist that I am saying this, but I
>have repeatedly and very explicitly said exactly the opposite.  Yes,
>the traversed-to document itself is necessarily in the BOS.  BUT THE
>TRAVERSED-TO DOCUMENT'S SUGGESTED BOS IS NOT ADDED TO THE BOS UNLESS
>THE USER DECIDES TO DO THAT!  The user is under no compulsion to do
>any such thing, and neither is the application.

This is the point of our misunderstanding. How can a user know whether he
should add it or not?

>One way to keep down the number of documents in the BOS is to add them
>only when the user requests them, perhaps by attempting to traverse a
>link to a document which is not currently in the BOS.  The fact that
>the document is not currently in the BOS only means that the
>application is not currently responsible for the links contained in
>that document.  Once you do enter that document, however, the
>application becomes responsible for the links in that document that
>have linkends in any documents that are already in your BOS, as well
>as any linkends in the new document of links that are already in your
>BOS.

A user of a document set needs guidance from the author to know whether he
should extend his BOS when he does a specific traversal. My readers cannot
know the relevance of my links to future traversal possibilities until after
they have traversed them. As an author I could suggest that this set could
usefully be added. (For example, if they are referencing another of my hub
documents that is constantly referred to from this one I want to suggest
that they add the BOS as otherwise they will constantly be swapping BOS sets.)

>> Now look what happens when you edit the referenced document, and
>> delete or change some of the links in it. If those links are in the
>> area I have referenced then the copy of the entity list that I took
>> when authoring is now out of date. How do I determine this? If they
>> are outside the area that I am referring to I don't care about the
>> changes, even if this means that the "complete copy" of the entity
>> set I took earlier is now out of date. But how can I tell that the
>> change in the entity set is one that does not affect my UWOS?
>
>What is all this about copying?  What's being copied, where is it
>being copied to, and why?  I didn't say anything about copying.  If
>you are discussing the creation of documents that record the BOSs that
>are accumulated in sessions, that's an interesting idea, but I have
>never suggested it.

I'm thinking in terms of recording History's (nested BOS sets) in the way
you currently can Bookmarks, to save having to regenerate them each time you
revisit the site. (This is important for my users, who will often want to
break off a session and return to it at a later date with all the BOS
information in place.)

>> (What I need is to be able to reference the BOS of the referenced
>> document independently of the document itself in such a way that I
>> can identify when it has been changed so that I can then revalidate
>> my own document's references.)
>
>The BOS is forgotten as soon as the session is over.  It doesn't exist
>anywhere but transiently and very dynamically in the memory of a
>computer running a HyTime application.  Are you worrying about the
>fact that a BOS's physical addresses can go stale during a session?
>If so, I submit that we don't need to worry about that now.

Agreed, but I was hoping that a Session BOS would be reusable. You obviously
are not thinking along such lines, but might like to reconsider.

>Your remark seems to indicate that you think I am suggesting that the
>each session-start document contain a copy of all of the suggested
>BOSs of all the entities in its own suggested BOS.  I am not
>suggesting that, and I have never intended to suggest that.

Sorry to misinterpret you.

> >The existence of a BOS suggestion *enables* ilinks by making them
>> >practical and scalable, which in turn *enable* the creation of
>> >annotations of read-only materials which can be seen in the context of
>> >the read-only document.
>
>> If I reference your read-only material by an existing link, how can
>> I determine whether there are already sets of annotations associated
>> with this document?
>
>The answer to this question is obvious: you can't.  Without recourse
>to something else -- such as a publisher's catalog, or the online
>equivalent of _Books In Print_, you can't know whether, for example,
>someone has published a commentary on Winston Churchill's _The
>Gathering Storm_.  You certainly can't tell from any copy of _The
>Gathering Storm_, even those printed after the publication of the
>commentary.  Despite the fact that the commentator would love to make
>an advertisement for his commentary appear right inside all copies of
>_The Gathering Storm_, fortunately or unfortunately, he doesn't have
>write access to that document, so he can't.  I can't believe you would
>suggest that everyone have write access to everything, so I'm now
>wondering if you're suggesting that XML should undertake some sort of
>cataloging task to produce something like _Books In Print_?  If so, I
>think this idea is impractical for the XML project and unnecessary in
>any event.  People are already doing that and making money at it.

But wouldn't it be nice to ask AltaVista to search for all XML locators that
pointed to a particular electronic copy of The Gathering Storm? My wife, who
is a reference librarian, would just love such a facility to help her deal
with just the sort of query you have used to ridicule my idea above. The
point is that XML on the Web should allow us to overcome some of the
restrictions we find in today's unautomated world.

>> The only way I can know that a set of annotations exists for a
>> document at present is to access the annotations separately
>> first. What I really need to be able to is to have the document
>> identify for me which sets of annotations point to it. In other
>> words the document needs to "keep a record" of all files which
>> reference it in their UWOS. All a BOS associated with a set of
>> annotations could possibly tell you is the set of files the
>> annotations refer to.
>
>I take it all back.  Incredibly, you *are* suggesting that everybody
>have write access to everything.  To put in bluntly, this proposal
>will be as popular with publishers and information owners as a turd in
>a punch bowl.

Yes, but publishers and information owners are only a small part of the Web
community. If you are saying that XML is only for publishers and information
owners then you better make this clear to the world upfront. Most of us
believe that XML has a wider potential. I personally prefer to think of the
web as a research tool. Thats how I use it every day. 

>> I think David is pointing more to my idea of "the files that you
>> need access to to make sense of this document", rather than a
>> bounded object set.  What I think is more relevant is to use the
>> History of links visited by the author of the annotations to create
>> a list of the files that must be made available to users, rather
>> that just saying that any file linked to any file that is referenced
>> should be recorded as part of the BOS.
>
>I have not proposed that "any file linked to any file that is
>referenced should be recorded as part of the BOS."  I have repeatedly
>said something quite different and much more subtle.  Hello?

You used the term HyTime BOS. By definition, therefore, you did say,
indirectly, "any file linked to any file that is referenced should be
recorded as part of the BOS."

>Whether a document may be created by recording the author's travels in
>browsing sessions is irrelevant, as is the broader question of how
>documents are created in general.
>
>Martin, please explain to me what you think a HyTime BOS is, so we can
>determine why we evidently have such different ideas about it.

Hopefully my initial comments have done that.
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.u-net.com/~sgml/
Received on Tuesday, 31 December 1996 06:20:37 UTC