Re: [dgd@cs.bu.edu: BOS confusion (analysis; suggestion to resolve Newcomb/Bryan conflict)] from Martin Bryan on 1997-01-04 (w3c-sgml-wg@w3.org from January 1997)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Sat, 04 Jan 1997 14:23:57 +0000
To: "W. Eliot Kimber" <eliot@isogen.com>, w3c-sgml-wg@www10.w3.org
Message-Id: <1.5.4.32.19970104142357.006a0dc0@mail.u-net.com>
Eliot Kimber wrote

>This is all true, but it's not the whole story.  The TC also defines rules
>for "implied location sources".  For queries, the location source may be
>inherent in the query--thus, the query may define the grove that is its
>ultimate location source and it may do so *without reference to a declared
>entity*.

While this is true I don't see its relevance to XML - are you implying that
an XML query shoul be able to build a grove of a subtree of an XML document,
rather than having to start from the root of the XML document? (By default
the implied location source is the root.)

>  In the case of URLs used as a query, the location source is
>inherent in the URL and the grove would be the grove constructed from the
>object at the address specified by the URL. 

What do you mean by "at the address specified by the URL". Is this the grove
of the whole document or the grove starting from the fragment identifier of
the URL?

>Thus, there is no requirement
>to declare entities in order to establish the location source of query
>location addresses.

Agreed, but that does not solve my link management problem. Let me explain.

To manage my set of highly inter-related files I need to be able to set up a
database of all the URLs my files point to. Each file pointed to needs to be
listed once only in the database. Each entry in the database needs to be
given a unique identifier that acts as a key to that URL (the URL is too
long to be a key). To avoid having to change my links, rather than my
database, whenever the location of a particular file changes my links need
to reference subcomponents of the referenced file by adding fragment
identifiers, TEI location statements or queries to the unique ids (location
source) assigned to the URL in the database. Before transmission the
location source key is replaced by the URL currently stored in the data.

If I need to transmit part of my database to another location, either as
part of an application-specific BOS or as a catalog, all I need to do is to
create SGML entity definitions with the database key as the entity name and
the URL as the (hopefully formal) system identifier.

If someone who has loaded one of my transmitted entity sets into his own BOS
or catalog needs the entry to be validated I will provide a form whereby he
can enter the entity name of the received locator and I will return to him
the current value of the URL as I have it in my database. That way my users
can ensure that their BOS/catalog entries remain as up-to-date as their
checking against my database allows.

From a link management point of view SGML entity declarations are about the
simplest way of exchanging up-to-date information about URL databases.

>Note also that the term "entity" is the *only* term by which SGML (and thus
>HyTime) can refer to SGML documents as all SGML documents are, by
>definition "document entities".  Therefore, any HTML, SGML, or XML document
>located by a URL is an entity, whether it has been declared explicitly or not.
>
>>What I am unclear about is what would constitute a the grove of something
>>pointed to using a URL of the form report.html#point1 if the HTML page
>>pointed to was constructed as follows:
>>
>><p>The points raised were:
>><ul><li><a name=point1></a>How useful is HyTime
>><li><a name="point2"></a>How useful is XML
>
>The grove would have to be constructed by an "HTML grove constructor" that
>knows how to account for these differences (the HTML parser in SoftQuad's
>HoTMetaL is an example of a tool that has the smarts to put most HTML
>documents into an SGML "grove"--in this case, HoTMetaL's internal object
>model of SGML documents, which is functionally a grove).

Yes, but what is that grove - the whole of the HTML document entity, or the
grove created from the pointed to fragment, which in the case above is an
empty named anchor, rather than the text of the list item, as I could not
put the anchor around the whole text in case I wanted to make HyTime or XML
an anchor, e.g. entered as:

<p>The points raised were:
<ul><li><a name=point1></a>How useful is <a href="#HyTime">HyTime</a>
<li><a name="point2"></a>How useful is <a herf="#XML">XML</a>

>Unfortunately, behaviorial specifications are usually easier to state
>initially and appear to be easier to understand (until you run into
>ambiguous cases, then it gets very difficult to determine what should
>happen). 

The point of my previous example is that unless we unabiguously define the
behavioural specification of what it means to point to a named HTML anchor
in XML it will be impossible to avoid ambiguous cases such as the one shown
above.

> Of course, only providing a behavioral specification does let the
>first implementor define the real rules through their implementation
>choices, which is to the implementor's proprietary advantage....
>
>>Given the fact that most documents on the Web would not allow sensible
>>groves to be constructed from existing named anchors the relevance of HyTime
>>location sources for linking XML files to HTML anchors must be considered.
>
>As demonstrated above, your inference is incorrect. 

I'm not sure you've proved this at all. Note the presence of the term
"existing named anchors". Until I know what you mean by "the grove" of a
referenced HTML document I am unconvinced that I can define rules that will
be obeyed by every XML browser pointed to an HTML file in exactly the same way.

> Sensible groves can be
>built from HTML documents given the appropriate grove construction process
>(using either the SGML property set or some other property set--the
>property set used doesn't affect the ability to use HyTime location
>addressing).  Note that the "grove construction process" doesn't have to be
>literally implemented--it can consist simply of a design document that says
>how to *behave as if* you had constructed the grove--in other words, it's
>an API spec where the input is location addresses of a particular form and
>the output is the data addressed (or pointers to it, obviously).

Agreed entirely, but the API spec must be defined as part of XML in a way
that cannot be misinterpreted. Doing this in the form of a behavioural
specification would worry me, though it would, of course, make XML links
much more acceptable to uninitiated potential users.

----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.u-net.com/~sgml/
Received on Saturday, 4 January 1997 09:25:59 UTC