Re: [dgd@cs.bu.edu: BOS confusion (analysis; suggestion to resolve Newcomb/Bryan conflict)] from David G. Durand on 1997-01-02 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Wed, 1 Jan 1997 22:25:35 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130503aef0ce077460@[165.90.139.110]>
At 5:40 PM 1/1/97, Steven R. Newcomb wrote:
>>    Steve is assuming that I can tell that only _some_ documents are "real"
>> hub documents, so that I can tell, when I follow a link, if I have left the
>> region governed by one hub, and entered the region governed by another hub.
>> At this point my application could merge the document sets defined by the
>> hubs, save them in a history, or anything else it wants to.
>
>You're correct.  I was assuming that a hub document (er, session-start
>document) would tell you that it needs the rest of its BOS (er,
>working document set) to be complete and work right.  Then a user
>could say whether he was willing to incur the overhead and delay of
>assembling that BOS (er, working document set).

I changed the terminology, not for its own sake, but because I am not
convinced that a HyTime BOS meets the explicitness requirements I think we
require. I could also argue that "working set" is a commonly understood
term in storage management, and in this case the analogy reveals the
intention of the term better. I have another reason as well, which we will
get to later on.

>>    My contention is that the notion of (even restricted) recursive document
>> requirement could be harmful.
>
>How so, if it's restricted?

Because think that any restriction that's more general than the explicit
selection of indiviual files for inclusion is leaving too many files open
for inclusion. There's a big difference between what I want to reference,
and what I want to tell a browser to download -- they may sometime be the
same, but they are at least as frequently different.
HyTime BOS is defined in terms of entity references, and not all XML links
will be via an external entity. At least, I take it as axiomatic that we
are dead in the water if we don't _at least allow_ the creation of links
that contain just a URL in an attribute value.

   So some references will not be part of the BOS. I am also one of the
people who argued vociferously (and successfully) that an XML parser _must
not_ be required to follow all entity references. This also makes the BOS
potentially problematic, even for the entities making up a single document.

>>    2. The term BOS is not that useful in describing browser behavior, as it
>> is unfamiliar outside of HyTime, and the HyTime notion can be applied to
>> the web in at least two distinct ways. I suggest that we use a new term
>> "working document set" to mean a set of documents that need to be processed
>> together, and within which ilinks in all the documents are supposed to be
>> accessible. Applications may mess with the working document set, under
>> arbitrary user control; but an author can augment the browser's working set
>> for a particalar document by declaring "companion documents", which will be
>> added to the working document set when that particular document is
>> processed.
>
>I don't see why it's helpful to invent yet another "de novo" term,
>"working document set", when we have an internationally standard
>HyTime term "bounded object set", that means exactly the same thing,
>except that it includes the notion of bounded recursion -- something I
>believe we're going to deeply regret not having, in any case, if we
>can't do it in XML.  (In fact, isn't it already there, by virtue of
>the fact that external entities can declare external entities
>recursively, essentially *without* bounds?)

As noted above, and for the same reasons we have been re-arguing here for
links, XML processors are _not_ required to pull in a whole entity tree at
once.

>On the other hand, "working document set" is pre-overloaded and likely
>to cause confusion in just those situations where you really need a
>no-nonsense technical term, like "bounded object set" for a very
>specific technical concept.  There's something to be said for a term
>that advertises the fact that you must sit down and learn exactly what
>it means in order to use it.

Well, my problem is that I think that the term that we need does _not_
match BOS in every particular, but only in some particulars. So I can gain
both perspicuity (my opinion) and accuracy.

> If at the end of the ramp there's a huge step up
>to HyTime, or if the ramp doesn't even take you anywhere near HyTime, we
>have blown a unique opportunity to interconnect the world's knowledge
>bases.

   There is a working example of interconnecting the world's knowledge
bases. It is the WWW. This is in some ways sad, but undeniably true.

>The XML on-ramp to SGML is pretty much already designed.
>Since we're now considering hyperlinks, I hope what we're working on,
>among other things, is an on-ramp to HyTime.  Am I wrong to hope for
>this?  Is there any design, conceptual, or terminological question the
>XML group faces that is so moot that it will be decided on the basis
>of consistency with HyTime?  If so, which way will it be decided?

   I don't have an explicit goal either way with respect to HyTime. The
brief of the group does _not_ include HyTime compatibility. Personally, for
functionality, I am applying a standard to HyTime that I describe as "no
gratuitous incompatibility". I see no reason, technical or offical, why we
should stick to HyTime if we think that HyTime got something wrong. On the
other hand, there is a large amount of good work on the relationship of
hypertext to markup in HyTime. So I think the semantics are mostly right. I
think we can manage to be mostly compatible with the standard as well
(modulo a few PIs a user can add if they need them). I only say mostly in
case there are places (I feel anchor roles are one) where HyTime may impose
a restriction that we can avoid.

   As to terminology, I am not so convinced. Most HyTime (and SGML)
technical terms are rather long idiosyncratic phrases. I would really have
no problem using terminology tailored to our application and providing a
HyTime glossary for those who want one. Some of the terms like clink,
ilink, and location seem clear to me. Further, because they describe ways
of tagging hypertext features, there are no pre-existing terms for those
concepts (or closely analogous terms). I also want to avoid using uniquely
HyTime terminology in a sloppy or different sense than HyTime does. So now
we come to another beef with the term BOS:

>Basically, though, I agree with what David says.  I think there are
>three distinct flavors of "working document set" (just as there are three
>distinct flavors of "bounded object set (BOS)"):
>
>(1) the set that was intended by the author ("HyTime BOS" in HyTime),
>
>(2) the set that an application, with or without the complicity of its
>    user, tries to assemble ("application BOS" in HyTime), and
>
>(3) the set that an application actually winds up using (I'm proposing
>    the term "effective BOS" for this one).

I think we will only need 1 of these senses (Application BOS). This
reflects the reality that an application can do what it wants. We also need
a method (to be determined, though I suggested one route) for an author to
suggest a minimal set of documents that should be in the (application BOS,
Working Document Set) for a document to be at its best.

   So I don't want to just say BOS, since that is confusing to those who
know that there are other senses, and I don't want to say "application BOS"
because then we will be using a modifier ("application") on every
occurrence of a realtively recondite term none of whose other senses need
come into our standard.

>>    3. The notion of explicit "hub documents" is foreign to the web, and
>> easy to synthesize from the "companion document" mechanism, so we might as
>> well leave it be.
>
>There's no explicit "hub document" in HyTime, either; it's just the
>document you nominally started with when assembling a hyperdocument
>from a bounded object set.  I don't yet see the conceptual difference
>between HyTime's hub document concept and the idea of having a
>document declare its companion documents.

I don't think that there's a conceptual difference in the motivation, but
there are practical differences in the execution: in my suggested scheme
you become a companion only by explicit designation. Other forms of
reference do _not_ cause any necessary change in the working document set
-- and there is no work required to make this so. This means that the
typical table-of contents page on the web can have no untoward working-set
effects. A document must _explicitly_ ask that another document be
processed along with it.

>And I don't see the difference between allowing the use of a pruned
>entity tree rooted in the hub document's SGML document entity to
>determine the bounded object set (er, working document set), versus
>providing a list of all the "companion" documents that would have
>resulted from traversing that same tree, except:

But that's not what I said. I suggested that for the BOS case you could
cite _one_ directory document that would be responsible for tracking the
BOS.

>* It's more trouble to specify the explicit companion list, instead
>  of just the entities that declare the companions in their own entity
>  declarations; and

I don't understand this. Or rather, I don't want the "convenience" of
automatically fetched entity trees, eliminate that "conventience" and the
two schemes are the same amount of work.

>* I have much more maintenance difficulty because changes in any of
>  the physical addresses of any of the companion documents
>  must be reflected in the hub document's entity declarations.  I
>  can't delegate that maintenance to the maintainer of a companion
>  document.

    You could change it in the "hub" or in the local companion declaration.
But you have to change it _somewhere_, and only in _one_ place. The orginal
problem was to centralize address management, I showed how to do that. If
you just list companions in each document directly, you can ease local
maintenance, but you lose central control. I don't see that this tradeoff
can be different in any system whatsoever.

    On my proposal, the companion of a companion is a companion (we take
the transitive closure of the companion relation). So we get the same
ability to express things that we have following entity tree. What we have
gained is decoupling of "companionship declaration" from entity
declaration. I can imagine authors who want to do a lot of linking, but not
use entity declarations -- they would be accommodated by this. I can also
imagine that many authors will have links that should _not_ indicate
companions.

>
>>    Note: To get the effect of a well-known hub document, we declare that
>> hub as the companion for all of its desired members -- and in the hub
>> itself, we declare all the desired members as _its_ companions. _Viola_
>> wherever we start, we automatically pull in whatever documents the "hub"
>> indicates as essential.
>
>But doesn't this mean you must have all this already-redundant
>information kept in synchronized copies all over the place?  Sounds
>like a dreadful and unnecessary hyperdocument maintenance headache.
>Ouch!  Also, doesn't this mean that an author can't specify companion
>documents that he can't write on?  Sounds impractical.

No, It means that if we want centralized control of a companion set, we
make a document that represents the set, and then an individual document,
can indicate the hub for that set to have all its members as companions.
That's one declaration in each document (that never changes unless you move
the hub, and didn't use FPIs), and one delcaration for each member stored
in the Hub document. No redundancy. If you don't want centralized control,
you can arrange things differently. I was answering that charge that
document by document specification does not let you centralize control. I
think it's obvious how to use it to _de_-centralize control.


>Happy New Year (1997 has *got* to be better than 1996)!

Kalh' Xronia'  (Happy New Year) to all. So far it's looking good. We'll see
what the morning brings!

  -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Wednesday, 1 January 1997 22:19:04 UTC