Re: the return of the Public Identifier Question from Terry Allen on 1997-03-20 (w3c-sgml-wg@w3.org from March 1997)

From: Terry Allen <tallen@sonic.net>
Date: Thu, 20 Mar 1997 12:41:40 -0800
To: U35395@UICVM.UIC.EDU, w3c-sgml-wg@w3.org
Message-Id: <199703202041.MAA27469@bolt.sonic.net>
Michael replied helpfully:
| >What is the model of interaction between processor and application that
| >the SGML ERB is implicitly relying on?  When I download an XML file
| 
| The ERB is explicitly trying to stay away from prescribing an
| implementation model, so these questions are hard to answer.  I think
| the answers to your questions are implicitly as follows, however
| (personal opinions only):
| 
| >that contains a public identifier BEACH,
| >
| > - is it the a or the p that the file is directed to first?
| 
| An implementation choice.  It is certainly possible to imagine a
| situation in which P handles all the entities and passes a grove or
| some simple format (e.g. sgmls output format) to a downstream
| application; in this case, P sees the file and passes to A some
| representation of it.  It's equally possible to imagine A being in
| the driver's seat, and A reading a file or data stream and periodically
| passing pieces of it to P saying "Here, parse this and give me a grove
| back".  In this case, A sees the file first.

It seems to me that if the XML spec constrains only processors, and
the resolution (or indirection handling) associated with public identifiers
can be the responsibility of the either the processor or the application,
little is gained by specifying how the processor should do this work,
because if the app does it instead, it may be done differently.
But then I simply do not understand how XML can be made to work 
without somewhat more implementation model than there is now (or
an abandonment of the notion of separate processors and apps),
so I'll let go of this thread after a few clarifications.

| > - does resolution mean converting BEACH to a URL, or (in the case of
| >   PIs) converting it to a system id for comparison with cache contents?
| 
| XML 1.0 defines system IDs as URLs, so I do not understand this
| question.  

I meant BEACH to be a public identifier.  

| > - does the p resolve BEACH as part of parsing only for certain
| >   purposes described in the spec?
| 
| If 'BEACH' is given as the public identifier of an entity, and the
| entity is referenced, then if P is responsible for expanding entity
| references, P is responsible for finding the data stream named by
| 'BEACH', at entity-reference-expansion time.  Is this a trick question?

Maybe.  Public identifiers could occur in XMLdecls, in [other] processing
instructions, and as the targets of XML links and non-XML links.  I can
imagine the processor handling resolution of public identifiers
in processing instructions but not in the case of links.  But I guess
this is to be implementation dependent, which is I think what you
say below.

| > - does the p resolve BEACH only if it is used as a system or public ID
| >   and not if it is the target of a link?
| 
| I think XML 1.0 makes P responsible for expanding entity references,
| either a priori or on demand from A.  But clause 4.3 rule 8 could also
| be read as allowing A to handle it, and does not explicitly say that P
| *must* provide the service on request, so perhaps 1.0 is underspecified
| here.
| 
| Who translates public identifiers into system ids/urls, and when, is
| not yet specified, as far as I know.  I've been assuming above that
| P handles it at least for entities referred to.  That seems to suggest
| P could handle it for links, too.  Whether to translate from public id
| to url at parse time or at link traversal time seems to me best left
| to P to work out.
 ... 
| I like knobs, too.  But the WG and ERB may or may not feel that our
| personal preferences suffice as a reason to require that XML processors
| (and apps?) *always* provide *all* the knobs *as a condition of
| conformance*.  Whether a knob should be required here is an open
| question; I'm agnostic (as I said), and you haven't actually said
| what you think XML should do, only what you hope apps will do.

In the absence of a defined model of a-p interaction I'm at a loss
to say which should do what.  But I don't want XML to say that 
the user can't adjust this functionality.  (No language saying
I can't have a particular knob.)

| >So will it fail worse with or without specification?  I think the
| >other interoperability issues (see list of questions above) need
| >answers, and those answers might inform the choice of what to do
| >about public identifiers.
| 
| I don't think having or not having a required resolution method is
| likely to affect the frequency or severity of resolution failure,
| given that the publisher has made appropriate information available to
| enable resolution.  So I still don't see that your point has a bearing
| on our decision, indubitable though it may be.

The issue is really important when the publisher *hasn't* provided
enough info, or the info provided has grown stale.

| Wait, hang on a moment.
| 
| In practice, specifying a Minimum Required Method of public-id
| resolution will mean fewer failures, I think.  Here's the logic:
| 
|   - some failures will be due to network outages, permissions problems,
|     etc.; the frequency of these is unaffected by the MRM/no-MRM choice
|     though it may be affected by the choice of a resolution mechanism
|     (a mechanism that provides several levels of fallback is likely to
|     fail less often than one with no fallbacks at all)
| 
|   - some failures will be due to unavailability of required information
|     (e.g. failure to provide an appropriate SGML Open Catalog,
|     failure to register public ID with the new Public-ID server,
|     failure to install the correct version of the Sortes Vergilianae
|     Name Resolver, ...); these will be more frequent if a publisher
|     must provide the required information (always in effect a
|     public:system map) in more than one form.  So specifying a MRM
|     reduces the failure rate.
| 
|   - some failures will be due to inaccuracy of required information
|     (e.g. provision of an SGML Open Catalog with bad entries,
|     registration of the wrong public ID with the new Public-ID server,
|     installation of the Sortes Vergialinae Name Resolver with the
|     wrong config file ..); these will be more frequent if a publisher
|     must provide the required information in more than one form.
|     So here too specifying a MRM reduces the failure rate.
| 
| OK.  I don't know if this is what you were driving at, but I've
| convinced myself that the second and third laws of thermodynamics
| do provide an argument for specifying an MRM.

Pretty much what I was after.  I'll sit down now.  Thanks, Michael.



Regards,
  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
       specializing in Web publishing, SGML, and the DocBook DTD 
                   http://www.sonic.net/~tallen/
  A Davenport Group Sponsor:  http://www.ora.com/davenport/index.html
Received on Thursday, 20 March 1997 15:42:36 UTC