Re: anchor awareness (was Re: Richer & richer semantics?) from W. Eliot Kimber on 1996-12-22 (w3c-sgml-wg@w3.org from December 1996)

From: W. Eliot Kimber <eliot@isogen.com>
Date: Sun, 22 Dec 1996 11:11:15 -0900
To: w3c-sgml-wg@www10.w3.org
Message-Id: <3.0.32.19961222111106.00c608e4@uu10.psi.com>
At 05:51 PM 12/21/96 -0800, Tim Bray wrote:
>I think that Steve was making an important point, but I think that I
>didn't really get it.  So this is a request for amplification, with some 
>questions
>
>>The question is:
>>"Does an anchor know that it is an anchor?"
>
>What does it mean for an anchor to know it's an anchor... and I guess,
>what exactly are you terming an anchor?  Consider the following:

In HyTime terminology, an "anchor" is an object (or list of objects) that
is addressed by a hyperlink as a particular anchor role.  This definition
of "anchor" is slightly different from that used in HTML and in some other
hypertext formalisms (e.g., the Dexter hypertext model).  

In HyTime, anything that can be addressed by any means can *potentially* be
an anchor.  As you can address things without explicit identifiers, things
can be addressed without their knowledge.

Also, in HyTime terms, a thing becomes an anchor only when a hyperlink
addresses it, not before.  Thus, putting an ID on something does not, in
and of itself, make it an anchor.

A hyperlinking element can be one of its own anchors (a "self anchor",
meaning the link links to itself).

To relate this to HTML, the A element is a hyperlink when the HREF
attribute is used.  It is a "contextual link" because it is also one of its
own anchors. The other anchor is whatever the URL points to, which could be
another entire page, a named A element, or something returned by a query
(HyTime considers CGI scripts to be a form of query, using the notation of
"query" to mean "anything that HyTime doesn't define directly").

>Example 1: http://www.textuality.com/sgml-erb/mprdv.html
>
>not as an example, but in and of itself, embedded in the email you
>are now reading.  I would assume this is not an anchor in the sense that you
>mean; 1-way www semantics make it a link-end but not an anchor.  

Actually in this example, the URL is an anchor, but not a link.  The link
exists virtually in the mailer.  In HyTime, it would be defined something
like this:

<link refmark="url-in-text"
      refsub="objects-with-url">
Links all occurrences of URL strings in body of mail
message (reference marks) to the objects with that URL
(reference subjects).   Uses the Perl
support of the mail program to resolve the addresses.
Location source for both queries is current mail message.
</link>

<!-- Queries that implement semantics of link shown
     above. -->
<!NOTATION perl PUBLIC "-//L.Wall//NOTATION Programming perl//EN"
                       "<isbn>0-937175-64-1"
>
<queryloc id=url-in-text notation=perl>
@URLS = ();
open(MAIL, "< current-message");
while (<MAIL>) {
  chop;
  if (~/(("http:"|"ftp:"|"mailto:")[\w\d\.\/\-\~\?\#]+)/) {
     # Above regex probably flawed but you get the point.
     push($1,@URLS);
  }
return(@URLS); # returns list of URLs addresssed
</queryloc>
<queryloc id=objects-with-url notation=perl>
@objects = ();
foreach $url (@URLS) {
   push(&resolve_url($url), @objects);
}
return(@objects);
</queryloc>

In other words, you have a two-anchor link with two anchor roles,
"refmark" and "refsub".  In this case, the strings addressed
by the "url-in-text" function are the "refmark" anchor. The
objects with those URLs are the "refsub" anchor.  Both anchors
are lists in this case and defined using inter-dependent queries
(a very useful technique).

You probably wouldn't actually implement a mail program's
functionality this way (but you could and it would make for a very
interesting general facility to be able to define new links of this sort).
But it shows
how you can use links and queries to express the precise relationship
of function (matching URLs in text and resolving them) to the relationship
between the two funtions in a standard way.  
 
>                                                                The person 
>who placed mprdv.html at www.textuality.com and sent the URL out by email 
>was consciously creating an anchor that in some sense knows it's an anchor
>since there is an httpd server that will give anyone a copy, no questions
>asked.

But the anchor doesn't know it's an anchor, it only knows it's a string.
The application of linking semantics that make it an anchor are separate
from the string (and there's no requirement that they ever be applied even
though the author may reasonably expect them to be most of the time--if I'm
reading the mail with RN or something, the URL is not an anchor at that
moment).

>On the other hand, when 
>
>Example 2: <A NAME="sec3.17"> 
>
>appears in an HTML document, I assume you would call this an anchor that 
>knows it's an anchor?  It exists only to provide addressing hooks.

Even though the *semantic* of the A element when the NAME attribute is used
is to provide a point that *can* be linked to, as far as HyTime is
concerned, it isn't an anchor until someone *does* link to it.  In other
words, just putting an ID on something doesn't make it an anchor.  Or, said
another way, because you can potentially address anything, everything is
always potentially an anchor.  Putting an ID on something doesn't make it
more or less likely to be an anchor except to the degree that limits in
your addressing functionality make it easier or harder to address things
with particular properties.

>On the other hand, (Example 3: ) with some analogue of ilink, where you can 
>point into a document from outside using locaddrs or some such, you clearly
>have a case that what's being pointed-at does not and cannot in principle
>know it's an anchor... or am I missing your point?

I think Steve might be asking about three things:

1. Should we allow completely independent links (unilateral addressing of
   all anchors of a link).  If we disallow it, then at least one anchor
   of every link will know it's an anchor because the link is always one
   of it's own anchors.  I think we're all agreed that we need independent
   links.

2. Should the elements always indicate, in their syntactic representation,
   that they are anchors (e.g., an attribute called "anchored-by" that
   lists the addresses of those things that point at it).  This could be
   reasonable in a tightly-controlled, closed system of documents, but
   is probably not reasonable or possible in a Web environment and I doubt
   that anyone would seriously propose it.

3. Should the *methods* associated with objects *always* be informed when
   they are addressed as an anchor?  This is a bit more subtle, because
   it can be difficult or impossible to do this in all environments (e.g.,
   when the anchors are addressed by a query against the entire Web).
   In other words, in the general case its useful or necessary to defer
   resolving some anchor addresses until the anchor is traversed to (or
   access to the anchor is otherwise requested).  This means that there 
   will always be anchors that do not know they are anchors at the time
   link is created, only at the time an attempt is made to address the
   anchor.

The data represented by XML documents is dead and lifeless--it is just
data.  But there are always presumably processing methods associated with
the data--browser styles, retrieval methods, whatever.  Thus there will
always be *something* that is "responsible" for the data objects and could
potentially be informed about their anchor status.  These methods might be
specialized link management systems, e.g., HyTime engines, Xanadu systems,
Hyper-G servers, etc.
["Dynamic" or "active" documents are, I presume, documents where the
methods for certain data objects are tightly bound to the data object,
e.g., a 
script embedded in an HTML document.  However, such documents are no more
or less active than documents to which the equivalent function is applied
by external methods--in fact I'd argue they're *less* dynamic because the
tight binding tends to limit you to only one possible behavior, instead of
an infinity of possible behaviors.]

Note that in the HyTime model where you have a HyTime engine, the method
for any object can always interrogate the HyTime semantic grove to
determine if the object is an anchor, because the HyTime semantic grove is
where the information about all the links it is managing is maintained.
Thus you should expect any general-purpose HyTime engine to have an
"anchored-by?" function that will return a list of either all the links of
which the object is an anchor.  Given that list of links, you can then ask
the HyTime engine about the other anchors of those links. [And note that
since the HyTime semantic grove is a grove, it can be interrogated by DSSSL
styles and transforms using normal DSSSL functions, so there's an obvious
path to using DSSSL to apply presentation and/or behavior to HyTime-managed
hyperlinks.  Groves are so cool.]

This is essentially what systems like Hyper-G/HyperWave or Xanadu do:
maintain a database of what is linked to what.  HyTime just defines a
specific data model and some base semantics for it.  No magic here, just
standardization of typical practice (I'm not sure there's a enough
concensus to say "standard practice").

Cheers,

E.
--
W. Eliot Kimber (eliot@isogen.com) 
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"
Received on Sunday, 22 December 1996 13:13:01 UTC