Re: XML catalog draft

Paul wrote:
> I hear you expressing unhappiness with the catalog proposal as it
> stands, but all I can determine as your problem with it is that it
> doesn't specify how the XML implementation will determine what the
> catalog is.  I'm failing to understand why go on at length about this.

I'm not actually compalaining about that proposal particularly.
Sorry if I wasn't clear about that.  I am saying that we can't stop
there, because it doesn't solve enough of the problem to be useful.

I think that there are two acceptable scenarios which will work:
(1) no FPIs at all.  I favour this very strongly indeed.

(2) FPIs are used, and any XML application that needs to can turn
    the FPI into a system identifier and proceed.  (This is what
    Paul Prescod is opposing.  I _think_ he wants all resolution methods
    to be proprietary, so that we should not recommend one.)

> I understand you to say you don't want a final XML spec that says
> nothing about how an implementation should find its catalog.  Okay.
> Point taken.  Some might disagree and prefer to leave it unspecified,
> but I suspect a majority of us would at least like to see a non-binding
> recommendation as to how an XML implementation could find its catalog.

Yes.  Otherwise, if ArborText and SoftQuad and Synex and InSo/EBT and
TechnoTeacher and thirtyfive other companies all have XML products,
and you put an XML file on the web, you may need to deal with 40
different ways of handling PUBLIC identifiers in order that your XML
file "just works".

> So either (a) suggest some words right now that you'd like to add to
> the proposal or (b) let's agree now that, once we determine how an XML
> application will find its style sheet, we'll define an analogous way
> for it to find its catalog.

I am happy with either of these.  I will append a suggested mechanism.
Thanks for the prod :-)

> It seems more important to me to determine first (1) if we will add
> public identifiers to XML

I don't think we should decide to do that.  When this first came up,
I asked what problems were being solved.  The answers were
* backward compatibility with a different (not XML) language
  with which we were not otherwise compatible (i.e. existing SGML
  documents are very unlikely to work unchanged as XML, because of the
  different EMPTY syntax, the lack of CONREF, content model changes, etc.)

* warm fuzzy feelings that public identifiers are better than system

* the possibility of indirection -- which can be done in other ways --
  to achieve document longevity

Of these, only the last made or makes any sense to me.  If you have URNs,
you can put a URN in the system identifier and solve the problem that
way.  No extra language keyword is needed for that.  The people who
were arguing for the need for indirection seem to include those who
are now arguing that we shouldn't say how to do it.

I think PUBLIC is being added for no other reason than warm fuzzies

> and (2) if so, if we will recommend a
> catalog-based resolution scheme in XML,

I don't have feelings on this as long as it's no more than a way
to go from PUBLIC to SYSTEM identifiers (as James pointed out), and
as long as we document carefully, completely and exactly how to get
from a PUBLIC identifier to a CATALOG file.

> and (3) if so, will we limit
> the right hand side of catalogs to the same things allowed in XML
> system identifiers,
Yes.  Obviously we must.  If you need to add more features to system
identifiers, add them everywhere.  If not, don't.

> and (4) if so, will that be URLs or will we expand
> what we allow as XML system identifiers.

They must be the same.


OK, here is a suggested way to get to a CATALOG file.

[0] do not look for a catalog file unless you come across a
    PUBLIC identifier.

[1] if there is a processing instruction tht occurs before the DOCTYPE
    line of the form <?-XML-CATALOG "system-id"?> then that is the system
    identifier of the catalog and all other mechanisms are disregarded.

[2] determine the base address of the XML document as a system identifier.

    This may come from a <?-XML-BASE "system-id"?> processing
    instruction before the DOCTYPE line, or may be known externally.
    The processing instruction always takes precedence.

[3] search for "catalog.soc" as if it had been given as a relative URL
    embedded in the document at the point of the PUBLIC Idenntifier,
    using the mechansism described in RFC 1808, Relative Uniform
    Resource Locators.

[4] an application may choose to keep a cache of PUBLIC entities.  If so,
    it is up to the application whether to use CATALOG resolution
    if it has already stored a system identifier for that public Id;
    however, it should if possible check to see if any remote files
    have changed before using cached copies.

What have I left out?