Re: XML catalog draft

> From: "Christopher R. Maden" <crm@ebt.com>
> 
> 1) The resolution process for a public identifier is that all PUBLIC
>    entries be searched in order, then all DELEGATE entries.  It
>    appears from the description of when to use DELEGATE that PUBLIC
>    entries are searched only from the first catalog entry entity, then
>    DELEGATE entries from that entity, then PUBLIC entries from the
>    second catalog entry entity, and so on.  However, that wasn't
>    completely explicit, and seems out of keeping with the concept of
>    the catalog as a logical construct.

Your explication of the algorithm as described in the proposal is in
agreement with the proposal.  The proposal says:

 If there are no PUBLIC entries in a given catalog entry entity that
 result in an exact match with the interpreted value of the public
 identifier of the ExternalID, then all DELEGATE catalog entries in 
 that catalog entry entity are considered. . . .

 If there are no PUBLIC or DELEGATE matches for the interpreted value 
 of the public identifier of this ExternalID in the current catalog entry 
 entity, match processing continues with the next catalog entry entity 
 in the catalog list (if any). 

Perhaps you could suggest more explicit wording for the editors of the
XML spec to consider.

I'm not sure how this process is out of keeping with the catalog as a
logical construct.  Note this idea of completely processing one catalog
entry file/entity before going to the trouble to access and process the
next catalog entry file/entity is basically the same algorithm as
described in TR9401 and seems to be the obvious thing you'd want to
do over the internet.

> 
> 2) Does the DELEGATE process recurse?  I know that in the header notes
>    it indicates that this is left to implementations, but it should be
>    explicit in the specification.

Yes, it "recurses."  No, you misread the header note which seems quite
clear to me.  The header note doesn't say that whether or not the
DELEGATE process is recursive is left to implementations, it says:
 How to avoid undesirable recursion is left as an implementation issue.

The proposal says:
 The catalog lookup process for this public identifier continues with
 this new (replacement) catalog. . . .  This newly defined catalog is
 then processed in much the same manner as if it had been the originally
 specified catalog. . .

Perhaps you could suggest more explicit wording for the editors of the
XML spec to consider.  Personally, I do not think using the word
"recursive" is going to improve the readability, clarity, or general
acceptability of the spec.

> 
> 3) The DELEGATE description says that other parts of the entity are
>    not available, as opposed to SGML Open TR9401...  This is
>    confusing, as no other reference to TR9401 nor to other modes of
>    lookup have been made.  Mention at the beginning of the lookup
>    process description that only the public identifier is available
>    for referencing, and don't mention it again.  References to TR9401
>    should be restricted to an SGML Geek appendix.  The ambiguity here
>    contributed to my confusion on point 2).

Yes, I too did find the mention of TR9401 unfortunate, and I would think
suggestions for avoiding this might well be welcome by the editors.

The problem this reference is attempting to address is as follows.

In an XML catalog as proposed, the only two "lookup" entry types
are PUBLIC and DELEGATE.  If a given catalog entry entity has no 
PUBLIC match for the public id, matches for one or more DELEGATE entries
are attempted.  If there are any such matches, the newly "delegated to"
catalog is then processed, one entity at a time, each first for PUBLIC
matches then DELEGATE matches.

However, an XML catalog does allow "extensions" (catOtherEntry),
and the most obvious XML extensions will be valid TR9401 entries.
So even avoiding an explicit mention of TR9401, the question is
how an XML processor should consider catOtherEntry extensions
in a "delegated to" catalog or not.  The proposal addresses this
by saying that the only info about the ExternalID available to a 
"delegated to" catalog is the PublicID.  What this means is that
entries such as
	ENTITY "foo" "bar.xml"
	DOCTYPE "html" "html.dtd"
	SYSTEM "fred.sgm"  "http://www.acme.com/fred/fred.htm"
	. . . [any extension requiring a match on anything but public ID]
(seen as catOtherEntry extensions to an XML processor) would be
unmatchable in a "delegated to" catalog, but not unmatchable in
an "un-delegated to" catalog.  Note that extensions such as
	SGMLDECL "blerf.dcl"
	OVERRIDE "YES"
	. . . [any extension requiring no match or matching on public ID]
would, per the current proposal, still be processable in a "delegated
to" catalog.

The rationale for "shedding" all other info about the ExternalID in
a "delegated to" catalog is that the process of delegation is one in
which you've already decided, in effect, that you're trying to resolve
the public id, so it makes sense to continue the resolution process
using just that public id.  Furthermore, a "delegated to" catalog is
very likely to be on another system, so it is questionable to try to
resolve something based on its entity name (and system id) namespace.

> 
> 4) The behavior of CATALOG is not completely clear - is it like an
>    #include instruction?  That is, is the referenced catalog entry
>    entity inserted in its entirety into the catalog at the point of
>    the CATALOG reference, or is it appended to the current logical
>    catalog?  This has implications on point 1) - is the inserted
>    catalog then part of the current catalog entry entity for purposes
>    of the PUBLIC/DELEGATE cycle, or should things after the CATALOG
>    entry be searched before searching members of the referenced
>    catalog entry entity?

No, the referenced catalog entry entity is neither inserted in its
entirety into the catalog at the point of the CATALOG reference, nor is
it appended to the current logical catalog.  As the proposal indicates:
 
 The CATALOG entry can be used to insert new catalog entry entities into
 the current list of catalog entry entities.  The right hand side of a
 CATALOG entry is used to locate another catalog entry entity that is
 read after the current catalog entry entity if the current catalog
 entry entity does not provide a match for the public identifier.
 Multiple CATALOG entries are allowed, and the referenced catalog entry
 entities will be inserted into the current catalog list in order.

If the idea of inserting a new catalog entry entity into the list
of catalog entry entities (what the first quoted sentence says) is
not clear enough, the next quoted sentence restates the intent by
saying explicitly that the referenced catalog entry entity is read
after the current catalog entry entity.

paul

Received on Monday, 3 February 1997 14:15:53 UTC