Re: the return of the Public Identifier Question

(Stop me if I'm wandering from the topic.)

Michael wrote:
 [ clarifying that it's the processor that's under discussion  ... ]
| My two cents:  we want the ability to have all XML processors handle
| some resolution / indirection mechanism, in the same way, so that
| publishers don't have to provide SGML Open catalogs for Panorama, and
| something else for Vistabrowser, and a third thing for MSIE, and a
| fourth thing for ..., and an nth thing for Netscape.  (Well, the nth
| thing may not be under our control, but it would be nice if all the
| browsers that actually *support* XML are interoperable at some level.)
| That means all XML processors have to be *able* to handle the Minimum
| Resolution Method (MRM).

I don't want XML processors to handle resolution at all, I want them
to send URLs to the systems generalized URL-fetching mechanism to
be resolved.  I think what we are talking about is the process of
determining what it is that is to be resolved, which identifier
to choose ("how to manage indirection").

| I don't think (speaking for myself here) that we particularly want or
| need to forbid processors for experimenting with other resolution
| methods that might work better in particular environments, so I don't
| want to say the MRM must be the Only Sole Exclusive Resolution Method,
| or even that it be the primary method.
| I'd be open to saying "At user option, conforming processors must
| support the MRM" or "At user option, conforming processors may support
| resolution methods other than MRM" -- I think these two are equivalent
| in effect, but my brain has not yet had its minimum daily requirement of
| caffeine so I may be wrong.  What I think these mean, and what I'd be
| open to having the spec say, is "You can support any methods of
| public-identifier resolution you like, but (a) you must support the MRM
| and (b) you must give the user the option of telling you to *use* the
| MRM."  (N.B. I am not seriously proposing we use the abbreviation MRM in
| the spec; it's just a placeholder.)

This sounds reasonable, but I'm still confused about the user and
the processor.

What is the model of interaction between processor and application that
the SGML ERB is implicitly relying on?  When I download an XML file
that contains a public identifier BEACH, 

 - is it the a or the p that the file is directed to first?
 - if the a, does the a do something with BEACH before passing
   the file to the p?
 - is the string BEACH part of the p's output?
 - does the p resolve BEACH [or send BEACH out to a system utility
   for resolution] as part of parsing?
 - does resolution mean converting BEACH to a URL, or (in the case of
   PIs) converting it to a system id for comparison with cache contents?
 - does the p resolve BEACH as part of parsing only for certain
   purposes described in the spec?
 - does the p resolve BEACH only if it is used as a system or public ID
   and not if it is the target of a link?
 - if the p passes BEACH to the a, does the a resolve BEACH directly,
   without bothering the p, or does the a tell the p, "Go to the BEACH"?

| I'm also open to saying only that conforming processor must support the
| MRM and may support other methods, and how they decide which to use is
| to be decided by the designer, the implementor, the user, and anyone
| else who horns in on the discussion, but is not constrained by XML.
| (Sole difference:  implementations are not required to provide a
| user-settable option saying "do it this way".)

What are implementations?  If they are applications, as a user I
want one with all the knobs.  If they are processors, under what
circs are they resolving these IDs (see list above)?  

| >| 3 There appear to be three approaches to resolution that command
| >| or could command non-negligible support:
| >|
| >|   a SGML Open Catalogs, as specified in the current version of the
| >|     relevant SGML Open technical resolution
| >
| >If a catalogue can give as the rhs another public identifier, this
| >choice does not really result in specifying a resolution mechanism;
| >and if that's okay, then punting to mechanisms entirely outside
| >XML should be okay, too.  Catalogues are not resolution mechanisms,
| >they are indirection mechanisms, and that's just right.
| I'll have to reread the SGML Open spec in its current incarnation,
| but the last time I looked I thought the rhs of a PUBLIC entry
| had to be a system identifier.

My error.  Although just what the bounds of a system ID are is
a bit fuzzy in the Internet context, because you can regard as
much of the Internet as you like as part of your system (a point
Lee has made).

| But even if it can be another PUBLIC identifier, this is an issue
| only if the full SGML Open catalog is taken as the MRM, which seems
| unlikely.
| > ...
| >My point is that resolution (having power working) is not
| >indirection (choosing among PG&E, windmills, solar power, etc.),
| >and that any choice of method may result in failure of resolution.
| I think this is true, but so universally true that I'm not sure I
| can derive any consequences from it.  No matter what resolution
| method we choose, it can fail.  If we don't choose one at all, but

[ I'm reading "indirection" for "resolution" here.]

| leave the choice to implementors, it can still fail.  An implementation
| can add support for arbitrarily many resolution methods, but unless
| someone has a new software delivery method I don't know about, the
| set of methods supported will always be finite, and failure will
| always be possible.  (Well, perhaps not:  I suppose one might
| contract somehow for *guaranteed* service, but I suspect that means
| not that you will always have service but that they will pay you
| when you don't.)

So will it fail worse with or without specification?  I think the
other interoperability issues (see list of questions above) need
answers, and those answers might inform the choice of what to do
about public identifiers.

| >| With regard to these, the ERB leanings appear to be:
| >|
| >|   a Support for full SGML Open catalogs ...
| >|   b ... a suitable simplification of SGML Open catalogs ...
| >
| >Either way you get indirection, not resolution.  If that's okay,
| >it needs to be said what this requirement means for the processor/parser,
| >the application (a term not used since the executive summary), or
| >the implementation (=application?).  I nag at this point because
| >the concept of an XML application is currently a loose cannon.
| I'm not sure why this is so, unless I've carelessly given the wrong
| idea again.  As I understand it, either (a) or (b) would naturally
| be coupled with information about where to find the catalogs, and
| with some constraints that would ultimately, for any public identifier,
| either produce a system identifier or fail.  I think of producing a
| system identifier for a resource accessible to the processor as
| constituting resolution of the public identifier; am I misusing the
| term?

I'm thinking of resolution as actually obtaining the thing identified.
Merely producing a system identifier doesn't guarantee success
(that is, it can leave the matter unresolved).  And if anything on the
net is accessible to the processor if it wants to send a request for
a URL, production of a system identifier is only part of the process
(of course, if the thing is returned as a result of the request,
a system identifier can be produced for it).

| >But in the current environment of push for solutions to simple problems,
| >I would much rather that XML cut loose of this issue.  On the Internet,
| >the SGML notions of PUBLIC and SYSTEM aren't too useful, and eliminating
| >both in favor of URLs is a win (whatever the metaphysics of URLs).  If
| >I want to use URNs, nothing in XML 1.0 prevents me.
| Two points of apparent disagreement here.
| I think the notions of PUBLIC and SYSTEM are extremely useful on
| the Internet, since the ability to provide a *name* (public id) is
| a key to enabling software to locate a nearby copy of the thing named,
| possibly a local copy, rather than forcing a reload from a distant
| server.

I don't think we disagree about the utility of names.  But I think
that for XML the appropriate concepts are URL and URN, which are
somewhat askew to PUBLIC and SYSTEM.

| And the current draft spec of XML 1.0 does say that system identifiers
| are URLs.  There was some sentiment for changing that to say URIs,
| and letting the fates decide whether that meant URLs or URNs, but
| that change has not been made, that I remember.

True, and no disagreement here, although if system identifiers are
URLs, then the concept "system identifier" is unneeded (unless it
must be retained for SGML compatibility).

| >More generally, resolution of indirect names is not unique to XML
| >and ought to be dealt with on a system-wide basis, just like resolution
| >of URLs.
| I always have mixed feelings about insisting on system-wide solutions to
| problems that clearly should have such solutions.  If such an insistence
| helps *produce* a system-wide solution I can use, then it's good.  But
| frequently I have the problem precisely because those responsible for
| system-wide solutions have bollixed things up so badly, and cannot be
| persuaded to fix them.  In such a case, insisting on the Right Thing (a
| system-wide solution) punishes only ourselves, and we'd do better to
| find a partial solution we can actually do something about, rather than
| wait for a full solution we cannot ourselves create and push through.
| (An example:  character-set problems such as identification and
| translation ought to have a system-wide solution, at the OS level for
| all users of a local system, and at the network level for networks.  But
| neither of these is the case for any OS or any network I've yet heard
| of, let alone worked on.  All the complaints in the world were not
| enough to persuade Rutherford Labs to fix its manifestly false,
| unusable, and data-corrupting translate table, when it functioned as the
| sole gateway between JANET and EARN/BITNET.  In such cases, one needs to
| seek solutions that work around the problems imposed by system-wide
| stupidity, and that can be implemented even if those with system-wide
| responsibility don't achieve enlightenment.  I suspect this means,
| in this context, that I'll believe URNs work when I see them work,
| and in the meantime I'm quite happy with SOCats, which do work thank
| you very much.

Fair enough; I would suggest that the spec be worded such that the
processor (if that's the piece involved) be allowed to hand off
indirection management to another component of the user's system
without loss of conformance.

What matters is that indirection gets handled, not necessarily which piece
of the machinery handles it.  To determine whether it is necessary
to specify which piece of the machinery does the work, you have to
articulate a model of the machinery or of its operations.  


Regards, Terry