Re: [dgd@cs.bu.edu: BOS confusion (analysis; suggestion to resolve Newcomb/Bryan conflict)] from W. Eliot Kimber on 1997-01-04 (w3c-sgml-wg@w3.org from January 1997)

From: W. Eliot Kimber <eliot@isogen.com>
Date: Sat, 04 Jan 1997 09:06:37 -0900
To: w3c-sgml-wg@www10.w3.org
Message-Id: <3.0.32.19970104090631.00b3e4f8@uu10.psi.com>
At 02:23 PM 1/4/97 +0000, Martin Bryan wrote:
>Eliot Kimber wrote
>
>>This is all true, but it's not the whole story.  The TC also defines rules
>>for "implied location sources".  For queries, the location source may be
>>inherent in the query--thus, the query may define the grove that is its
>>ultimate location source and it may do so *without reference to a declared
>>entity*.
>
>While this is true I don't see its relevance to XML - are you implying that
>an XML query shoul be able to build a grove of a subtree of an XML document,
>rather than having to start from the root of the XML document? (By default
>the implied location source is the root.)

No.  I'm saying that, in HyTime, any query is free to define its own
location source (rather than using the normal location source attributes of
location address elements) when the implied location source value is
"inherent" (meaning the location source is inherent in the query).  For
SGML documents (and presumably XML and HTML), the SGML pgrove is always
constructed from the entire document (I dont' think you could satisfy the
SGML property set requirements if you didn't).  Having constructed the
pgrove, you then locate the node or nodes within that grove that you want,
e.g., the element whose ID matches the value following at "#" in a URL (or
the things addressed by an TEI extended pointer or whatever).

>>  In the case of URLs used as a query, the location source is
>>inherent in the URL and the grove would be the grove constructed from the
>>object at the address specified by the URL. 
>
>What do you mean by "at the address specified by the URL". Is this the grove
>of the whole document or the grove starting from the fragment identifier of
>the URL?

I mean the HTML or XML document located at the location specified by the
URL (not including any subsequent names or queries tacked onto the end of
the URL)

>>Thus, there is no requirement
>>to declare entities in order to establish the location source of query
>>location addresses.
>
>Agreed, but that does not solve my link management problem. Let me explain.

But it enables a solution that does not use entity declarations.

>To manage my set of highly inter-related files I need to be able to set up a
>database of all the URLs my files point to. Each file pointed to needs to be
>listed once only in the database. Each entry in the database needs to be
>given a unique identifier that acts as a key to that URL (the URL is too
>long to be a key). To avoid having to change my links, rather than my
>database, whenever the location of a particular file changes my links need
>to reference subcomponents of the referenced file by adding fragment
>identifiers, TEI location statements or queries to the unique ids (location
>source) assigned to the URL in the database. Before transmission the
>location source key is replaced by the URL currently stored in the data.

You can do this by creating a document that consists of nothing but
queryloc elements (e.g., my "URLLOC" element from my earlier example).
Each one has a unique ID, it contains the URL (and serves to isolate it for
easy update)--in short, it is functionally identical to using entity
declarations.  Ergo, *it solves your problem without entity declarations*.

>From a link management point of view SGML entity declarations are about the
>simplest way of exchanging up-to-date information about URL databases.

I would suggest that querylocs containing URLs are just as simple.  Entity
declarations may give you more information (because they include a notation
name and could, in SGML (but not XML today), provide data attributes).  I'm
not saying querylocs with URLs are necessarily better, just pointing out
that they will meet your stated requirements.

>>The grove would have to be constructed by an "HTML grove constructor" that
>>knows how to account for these differences (the HTML parser in SoftQuad's
>>HoTMetaL is an example of a tool that has the smarts to put most HTML
>>documents into an SGML "grove"--in this case, HoTMetaL's internal object
>>model of SGML documents, which is functionally a grove).
>
>Yes, but what is that grove - the whole of the HTML document entity, or the
>grove created from the pointed to fragment, 

What the grove is is dependent on the data type and the grove constructor
used.  I've been assuming that for SGML, XML, and HTML, you would use the
SGML property set and an SGML grove constructor.  That means the grove
constructed from an HTML document would include the entire document.  From
that grove you could then address a fragment (say by NAME attribute value).

>The point of my previous example is that unless we unabiguously define the
>behavioural specification of what it means to point to a named HTML anchor
>in XML it will be impossible to avoid ambiguous cases such as the one shown
>above.

I think I just did: use the SGML property set, build an SGML grove
(presumably using a very pared-down grove plan, no need to include stuff
HTML doesn't support and browsers don't care about), and then address it.
As Terry pointed out, for HTML, the "#name" syntax is really a query of the
form "find all A element nodes whose NAME attribute's value property is
'name'".  For XML, we can defined "#name" to be "find that member of the
'elements' property whose name value matches 'name'". [In the SGML property
set, the "elements" property is a "named node list" listing those elements
in the document that have a unique ID, indexed by ID value (e.g., a lookup
table of elements by ID)].

>>>Given the fact that most documents on the Web would not allow sensible
>>>groves to be constructed from existing named anchors the relevance of
HyTime
>>>location sources for linking XML files to HTML anchors must be considered.
>>
>>As demonstrated above, your inference is incorrect. 
>
>I'm not sure you've proved this at all. Note the presence of the term
>"existing named anchors". Until I know what you mean by "the grove" of a
>referenced HTML document I am unconvinced that I can define rules that will
>be obeyed by every XML browser pointed to an HTML file in exactly the same
way.

I've defined what I mean by the grove of a referenced HTML document.

Cheers,

E.
--
W. Eliot Kimber (eliot@isogen.com) 
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"
Received on Saturday, 4 January 1997 11:08:17 UTC