Re: anchor awareness (was Re: Richer & richer semantics?)

At 06:50 PM 12/22/96 -0800, Tim Bray wrote:
>I think we are making progress here.
>
>It seems obvious from Steve & Eliot's remarks that we might as well adopt
>the Hytime nomenclature and define "anchor" to be something that is 
>actually participating in a link relationship.  It's easy to explain and
>understand, and is totally unambiguous; definitely in the XML style.  (And
>web-head friendly; an HTML anchor *is* in fact an anchor when it's being
>used; the fact that it's not when it's not doesn't really muddy the waters).

Cool. Of course it scares me when Tim agrees with me this strongly :-)

>Probe 1:
>
>1. Web links contain lots of ends that don't know they are anchors, thus
>2. if we require all anchors to be self-aware, we won't be able to subsume 
>   the current Web mechanisms.

I suppose this is true, but I'm not sure it's useful.  In other words, I
can't imagine a workable system that would *always* require anchors to be
aware of their anchor status (or said another way, that all servers must
know which objects they *could* address are anchors at any given moment).

Therefore I would say that XML cannot *require* that all anchors be "self
aware" at all times, but neither should it assume that no anchors are ever
self aware.  In other words, XML should suggest a system where, from the
point of view any given client examining a body of XML documents, as many
anchors as practical are known *in advance of traversal to them*.

For example, in an Intranet, it might be completely practical to have
complete knowledge of all anchors in documents *within* the scope of the
Intranet but knowledge of none or almost none for anchors outside it.  This
is the way Hyper-G works, for example.  Clients communicating to servers
within the Intranet are informed by those servers about all the anchors
within the documents.  When those clients traverse to an anchor whose
address puts it outside the intranet, you get today's web behavior: you
don't know what you're getting until you get there and the thing you got to
didn't know (necessarily) that it was a target.

>Probe 2:
>
>1. If we support N-way links with some anchors not being self-aware,
>   there remains the question of whether we require *one* end to be
>   self-aware, but
>2. the benefits of self-awareness depend on an ability to state with 
>   confidence that all anchors are self-aware, so
>3. there seems no particular benefit to forbidding what Eliot calls
>   "completely independent" links.

As there already exist Web-based systems that support (or could easily
support) independent links, point three seems completely self evident (I'm
thinking of Hyper-G, Aqui, systems buildable with HyMinder, etc.).  I can't
see what benefit there would be to forbidding such things even though we
know they can't be used for all links.

Point 2 is correct, so it's important when discussing these issues to
distinguish scenarios where the system is closed or mostly closed (as in
Hyper-G) or is completely open (as in the general Internet Web case).  I
personally think XML's greatest benefit will be in Intranets, where it is
potentially possible to manage all the links and know all the anchors (at
least those not addressed by pathalogical queries).  Certainly within
Internets there will be significant information management and analysis
benefit to being able to apply new webs of links to existing documents
(imagine an MS Project schedule as a web of hyperlinks anchored to a
time-based coordinate space).

For point 1, I don't think it requires discussion, because the requirement
is inherent in the environment in which the link is expected to play.  If
you want to make the link available to the Internet masses, at least one
anchor must be completely and irrevocably self aware, i.e., the link must
be one of its own anchors (a "contextual link" in HyTime terms).  When you
are in an Intranet environment, it need not be.

Note that any contextual link can be made independent by a 100% complete
automatic process, so on one level, there's no conflict between contexual
links and independent links.  XML could also include an "invisible" linking
element architectural form that is designed to be wrapped around anchors to
enable the conversion of independent links into contextual ones [by
"invisible" I mean that the linking element doesn't affect the contextual
constraints of the things it surrounds, so that its presence or absence
does not affect the semantic interpretation of the data (except to mark it
as an anchor)].

>Probe 3:
>
>1. Current web links are untyped, un-roled (in fact carry no metadata),
>   and are either single-ended or are a completely opaque query, whereas

By "single-ended" I assume you mean "contextual, binary links", meaning
that only one anchor is explicitly addressed, the other being the link
itself.  Or do you mean that the anchor consists of a single object?
Because HyTime allows anchors that are lists of objects, it's important to
distinguish the number of distinct *anchor roles* from the number of
objects addressed for any single role.  It causes additional confusion when
people are not used to thinking of anchor roles so they aren't used to
thinking about grouping all the objects addressed by a role.  Note that
HyTime (with the TC) does have a hyperlink that represents simple,
typeless, aggregation: the agglink element form.

Certainly the link elements in HTML have very general types and no explicit
anchor roles.  Obviously this meets the requirements of a large number of
Web users, but we also know it doesn't meet the requirements of a
significant number of others.  I think it's useful to make a distinction
between "casual linkers" and "thoughful linkers".  Sometimes it's enough
just to make the connection, othertimes you really need strongly-typed
links.  The problem with the Web today is it supports the former well and
the latter almost not at all.  XML will help support the latter, but we
shouldn't ignore the casual linkers either.

>2. HyTime ilinks (OK, HyLinks real soon now) are multi-way and have
>   several useful places to encode what I think of as per-link or 
>   per-anchor or per-role metadata, so 
>3. on the surface it seems that an ilink, even if it involved all sorts
>   of non-self-aware anchors, would offer substantially greater utility
>   and flexibility than what the Web offers.

I think so.  Even the ability to have greater than 2 anchors is a
significant bonus.

As an aside, I'd like to stipulate that for these discussions the shorthand
"ilink" be taken to mean "any link that is potentially independent of its
anchors" and not worry about the ilink/hylink distinction until we get into
details of syntax.

>Probe 4:
>
>1. Enforcing anchor-awareness is not done in terms of syntax, but in
>   terms of allowed behavior during link traversal, so

Yes.

>2. The spec we're about to write would need to specify, not just
>   a language, but processor behavior, just as the XML spec does.

I'm not sure we need to specify the processor behavior (although it might
be wise to do so).  It might be enough to suggest various ways the behavior
could be provided.

>*If* I've understood the issues, I think I'm ready to take a position:
>no, we should not require universal anchor-awareness, because it's
>hard, and we can deliver significant benefits without it.  Also, as a
>hypertext-theory-challenged web-head, I have derived significant benefit
>from pointing freely at lots of other people's stuff; while I acknowledge
>that this creates problems in the area of intellectual property and so on

I think this proposal contains two orthogonal proposals:

A. Required anchor awareness is not a good thing and we shouldn't do it.
B. Unilateral addressing (possibly without the anchor's awareness or
   permission) is a good thing and we should allow it.

I agree with both propositions.

>(a) it seems unlikely that these problems are amenable to a technical
>    solution, and

Certainly no technical solution short of a one-world distributed
operationing system and document manager can do this (Xanadu anyone?).  But
there are things we can add to XML to support contracts people might be
willing to abide by, e.g., activity tracking policies.

>(b) this semantic [publish and let anyone point to it anonymously] is
>    something that is actively desired by many people.

So are free love and anarchy--that doesn't mean we should either condone or
provide them :-).  However, I think there's not much we (or probably
anybody) can do about it.  I'd prefer to let the big IP brains and
politicians battle this one out--I think the most we can do is acknowledge
the problem.  Documents don't violate copyright laws, people do.

>If Steve and I have a disagreement here, it seems to be crystallized in
>the following:
>
>Steve: 
>>What's the
>>use of ilink if none of the anchors know that they are linked?  After
>>all, an ilink is nothing more or less than a link whose own location
>>can be different from [Independent of] the locations of *all* of its
>>anchors.
>
>To which I'd answer; the use is that of the existing Web mechanisms,
>only made multidimensional, and with a place for metadata, and applicable
>to read-only media.  The usefulness of the existing Web mechanisms is not 
>open to realistic challenge; and multidimensionality and metadata seem, on 
>the face of it, to be major steps forward.  And easy to achieve.

I agree with Tim here (and I suspect that Tim and Steve don't actually have
a disagreement).  By "multidimensional", I assume Tim means n-ary links
with n>2.

>Follow-on questions... *if* we specify some self-aware anchor machinery, 
>*and* we define behaviors for an XML processor for accessing these things, 
>then:

>a) could an XML processor, by offering an anchor-retrieval mode that
>   refuses to retrieve anything not self-aware in the approved way, 
>   act as a gateway that would deliver the benefits that Steve is 
>   discussing?

I'm not sure an XML processor needs to be this restrictive.  It should just
do as much as it can. 

>b) could such machinery and behavior be described simply and compactly,
>   with a degree of difficulty not greater than that of the XML spec?

Don't know about the last bit, but probably.  At it's basics it's not that
hard: you're just managing a relational table that relates objects to the
links they participate in and a table that relates links to their metadata.

When we were working on the HyTime design this last summer, one of the
things we did was try to express the grove model the Newcomb's had worked
out for the HyTime property set as the equivalent relational tables.  It
wasn't too hard.

I know there are some hard implementation problems in doing link management
in a distributed networked environment (probably the same problems faced by
any distributed system that requires concurrency in rapidly-changing data,
i.e., transaction-support systems and parallel operating systems).  But I
don't think the basic data model or resolution algorithms are very
complicated.  And we can certainly simplify from the general HyTime model
(which I have always assumed we'd do).

Cheers,

E.
--
W. Eliot Kimber (eliot@isogen.com) 
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"

Received on Monday, 23 December 1996 11:43:42 UTC