[dgd@cs.bu.edu: BOS confusion (analysis; suggestion to resolve Newcomb/Bryan conflict)] from Steven R. Newcomb on 1997-01-01 (w3c-sgml-wg@w3.org from January 1997)

From: Steven R. Newcomb <srn@techno.com>
Date: Wed, 1 Jan 1997 17:40:43 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <199701012240.RAA11794@bruno.techno.com>
------- Start of forwarded message -------
>    I've been reading Steve and Martin's notes, and I may have some idea of
> the disconnect going on:

Thanks for your help, David.  I've been pretty boggled by that
discussion, and I'd like to apologize to Martin for any offense.

...

>    Steve is assuming that I can tell that only _some_ documents are "real"
> hub documents, so that I can tell, when I follow a link, if I have left the
> region governed by one hub, and entered the region governed by another hub.
> At this point my application could merge the document sets defined by the
> hubs, save them in a history, or anything else it wants to.

You're correct.  I was assuming that a hub document (er, session-start
document) would tell you that it needs the rest of its BOS (er,
working document set) to be complete and work right.  Then a user
could say whether he was willing to incur the overhead and delay of
assembling that BOS (er, working document set).

>    My contention is that the notion of (even restricted) recursive document
> requirement could be harmful.

How so, if it's restricted?

> I think we all agree that for ilinks to be
> useful, we need a way to pull in documents known to contain ilinks. And we
> all agree that there should be a way for a user to explicitly add a set of
> ilinks to their working document set. I have never been persuaded that the
> notion of a single starting point document is essential, though it's
> certainly useful in some cases. And I think that requring automatic
> following of all links (even to one level) is impractical.

No argument.  Suggesting it is one thing; requiring it is another.

>    So I suggest:
> 
>    1. We have identified a clear requirement for XML documents to be able
> to idnetify other documents that need to be processed along with them
> (mostly for ilink resolution).
> 
>    2. The term BOS is not that useful in describing browser behavior, as it
> is unfamiliar outside of HyTime, and the HyTime notion can be applied to
> the web in at least two distinct ways. I suggest that we use a new term
> "working document set" to mean a set of documents that need to be processed
> together, and within which ilinks in all the documents are supposed to be
> accessible. Applications may mess with the working document set, under
> arbitrary user control; but an author can augment the browser's working set
> for a particalar document by declaring "companion documents", which will be
> added to the working document set when that particular document is
> processed.

I don't see why it's helpful to invent yet another "de novo" term,
"working document set", when we have an internationally standard
HyTime term "bounded object set", that means exactly the same thing,
except that it includes the notion of bounded recursion -- something I
believe we're going to deeply regret not having, in any case, if we
can't do it in XML.  (In fact, isn't it already there, by virtue of
the fact that external entities can declare external entities
recursively, essentially *without* bounds?)

On the other hand, "working document set" is pre-overloaded and likely
to cause confusion in just those situations where you really need a
no-nonsense technical term, like "bounded object set" for a very
specific technical concept.  There's something to be said for a term
that advertises the fact that you must sit down and learn exactly what
it means in order to use it.  

<philosophy>Anyway, there is no winning the terminology war.  The more
we try to be precise, or, on the other hand, intuitively
understandable, the more people we offend by making them feel either
ignorant (if they don't already know the field), or ignored (if they
do), or confused (either way).  One of the things the XML group must
face is the fact that the "on-ramp to SGML" (as I recall Tim putting
it) is necessarily going to have some upward slope.  We're all in
favor of having a smoothly upward slope, and as little slope as
possible.  Maybe it's best to use the term "working document set",
because it's easier then for folks to get on the on-ramp, but I don't
think so -- I think it's just confusion waiting to happen.  But I have
another concern.  If at the end of the ramp there's a huge step up
to HyTime, or if the ramp doesn't even take you anywhere near HyTime, we
have blown a unique opportunity to interconnect the world's knowledge
bases.  The XML on-ramp to SGML is pretty much already designed.
Since we're now considering hyperlinks, I hope what we're working on,
among other things, is an on-ramp to HyTime.  Am I wrong to hope for
this?  Is there any design, conceptual, or terminological question the
XML group faces that is so moot that it will be decided on the basis
of consistency with HyTime?  If so, which way will it be decided?

Basically, though, I agree with what David says.  I think there are
three distinct flavors of "working document set" (just as there are three
distinct flavors of "bounded object set (BOS)"):

(1) the set that was intended by the author ("HyTime BOS" in HyTime),

(2) the set that an application, with or without the complicity of its
    user, tries to assemble ("application BOS" in HyTime), and

(3) the set that an application actually winds up using (I'm proposing
    the term "effective BOS" for this one).

>    3. The notion of explicit "hub documents" is foreign to the web, and
> easy to synthesize from the "companion document" mechanism, so we might as
> well leave it be.

There's no explicit "hub document" in HyTime, either; it's just the
document you nominally started with when assembling a hyperdocument
from a bounded object set.  I don't yet see the conceptual difference
between HyTime's hub document concept and the idea of having a
document declare its companion documents.

And I don't see the difference between allowing the use of a pruned
entity tree rooted in the hub document's SGML document entity to
determine the bounded object set (er, working document set), versus
providing a list of all the "companion" documents that would have
resulted from traversing that same tree, except:

* It's more trouble to specify the explicit companion list, instead
  of just the entities that declare the companions in their own entity
  declarations; and

* I have much more maintenance difficulty because changes in any of
  the physical addresses of any of the companion documents
  must be reflected in the hub document's entity declarations.  I
  can't delegate that maintenance to the maintainer of a companion
  document.

>    Note: To get the effect of a well-known hub document, we declare that
> hub as the companion for all of its desired members -- and in the hub
> itself, we declare all the desired members as _its_ companions. _Viola_
> wherever we start, we automatically pull in whatever documents the "hub"
> indicates as essential.

But doesn't this mean you must have all this already-redundant
information kept in synchronized copies all over the place?  Sounds
like a dreadful and unnecessary hyperdocument maintenance headache.
Ouch!  Also, doesn't this mean that an author can't specify companion
documents that he can't write on?  Sounds impractical.

Happy New Year (1997 has *got* to be better than 1996)!

--Steve

             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA
Received on Wednesday, 1 January 1997 18:20:46 UTC