Re: the return of the Public Identifier Question from James Clark on 1997-03-20 (w3c-sgml-wg@w3.org from March 1997)

From: James Clark <jjc@jclark.com>
Date: Thu, 20 Mar 1997 15:02:54 +0700
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <2.2.32.19970320080254.00a47c38@jclark.com>
>  c Public first, then system (if the public id is not found in the
>    catalog).  One vote for this.
>
>  d Implementations may choose which to try first, but if the first
>    ID it tries fails, then the implementation should try the other
>    one.  I.e. implementations may *not* say "If both a PUBLIC and
>    a SYSTEM identifier are given, the XXXXX one is processed and
>    the YYYYY one is ignored."  Strong support for this view.

There are two ways in which use of a public id can fail:

1. resolution can fail (eg because there's no entry in the catalog)

2. access to the entity can fail (eg because the specified file does not
exist) even though resolution succeeded

I think it's a reasonable design to say that if a public identifier fails in
the sense of (1) and there's a system identifier than that system identifier
should be used.

I'm not in favour of saying that if a public identifier fails in the sense
of (2) and there's a system identifier then that systen identifier should be
used, and I'm not in favour of saying that if a system identifier fails,
then any public identifier should be tried:

- If the user has put an incorrect system identifier in a catalog or a
document, then that's an error, and a validating parser should tell the user
about it.  If the user, for example, mistypes a filename in a catalog, I
don't think you are doing them any favours by trying to silently work around
their error.  Why do users need to put invalid system identifiers in
documents or catalogs?

- In a general SGML context, a system identifier can consist of multiple
storage objects.  What does it mean for such a system identifier to succeed?
Does it mean that access to the first storage object succeeded, access to
all of them succeeded or access to one of them succeeded?  What does the
implementation do if access to the first storage object succeeds, but access
to the second storage object fails?

- Access to a single storage object can fail in multiple ways, for example
the object may not exist, the user may not have permission to access it some
sort of I/O failure may occur.  Should all modes of failure me treated
alike?  It might be reasonable to press on if the storage object doesn't
exists, but do you really want to press on silently if there's some sort of
permissions  or I/O problem?

- Implementation, for SP at least, would be non-trivial.  SP's approach is
to use the external identifier to generate a system identifier when it
encounters the declaration.  The generated system identifier is then used to
access the entity when needed.  Documents may declare many entities that are
never accessed (for example for use as link ends) so it's not desirable to
access the entity until it's needed.  The application that needs access to
the entity may be very loosely coupled with the parser, so it's desirable to
make it as simple as possible for the application to access the entity; this
is achieved by having the application access the entity using only the
generated system identifier.  I would have to implement something like the
FSI altsos option whereby a system identifier
"<osfile>foo.sgm</osfile>|<osfile>bar.sgm</osfile>" means try either foo.sgm
or bar.sgm.  Do we really want to require this sort of complexity of XML
implementors?

James
Received on Thursday, 20 March 1997 03:14:06 UTC