Public Identifiers, and CATALOGS

   The issue of naming versus resolution is still coming up, in
misunderstood form. For example, Lee's 6 equivalent mechanisms did not in
fact offer equivalent "naming" behavior, though they do offer equivalent
_resolution_ behavior. I will try to point out an effective operational
property of names that no resolution mechanism can provide. I also point
out that CATALOGS are the best solution to the stylesheet problem. In fact
a better solution to that, than to the name resolution problem!

   The most important fact about PUBLIC IDs is that they define equivalence
classes of documents. In order for these classes to make sense, we need to
ensure global uniqueness of names. While this seems hard, in practice using
hierarchical namespace delegation, and simply encouraging long identifiers
gets us 99% of the way there. Even unregistered FPIs have turned out the be
pretty robustly unique in practice.

   This is a social contract, but it's in fact less-onerous than the social
contract implied by a URL -- which is that the owner will maintain an HTTP
server, at a specified DNS location, with a specified resource on that
server. People fail this contract all the time, of course, because to meet
it requires money.

   Any time I see a PUBLIC ID that I've _ever_ seen before, it means that I
am justfied in re-using the results of my last resolution. So, PUBLIC IDs
are like a super-distributed cache. A browser using FPIs (even without ever
resolving one) can cache the first copy of the TEI DTD it ever sees, based
on the SYSTEM ID provided along with thar PUBLIC ID, and then _never fetch
it again_.

   Further, even thought FPIs (and all names) may become broken, a broken
FPI is actually more informative than a broken URL, as it generally
contains a lot of human-readable information that may help someone who
absolutely _must_ resolve it. And anyway, it's certainly no _more_ broken
than a broken URL, and somehow people seem to be able to live with a lot of

The forgoing is just to note that the objection that an unbreakable URN/FPI
service is required for useful naming is just as bogus as the claim that a
foolproof location mechanism is required for URLs to be useful. In fact,
the biggest insight that TBL had (in my opinion) is that the academic
hypertext community's insistence on consistent linkbases (unbreakable
links) was a demand for an unnecessary perfection. We all dismissed the WWW
as (possibly) interesting but obviously broken... and failed to see that it
was damned useful, even if it was "broken". FPIs are also damned useful,
even if we don't have _one way_ to resolve them.

    Now, as to CATALOGs per se, I'm agnostic. They are certainly a
reasonable mechanism for resolution, and with delegate, they are a
plausible draft of a distributed resolution mechanism.

    So I don't object to them. I actaully rather favor them for another
reason: we are heading down the road to including PIs to locate style
sheets, and I don't like that for a number of reasons:
  1. I _don't_ want to be forced to put processing information in my documents.
  2. I think it's a bad idea to _ever_ put processing information in documents.
  3. Practically, even with HTTP 1.1, it's not an effective implementation

   1, and 2 are very controversial, and I'll state them without further
discussion. 3, however, has been bothering me for a while.

    HTTP 1.1 fixes the multiple connection problems of HTTP 1.0 and allows
several commands to be bouncing down the wire at a time (which is very
good). _but_ it is not a multiplexed connection -- one whole resource must
be transmitted before the next one can be fetched. This means that once I
start fetching the document, I can start fetching the stylesheet, and the
catalog, but I can't use either of them until the document is all there.
What we really want (if we are to use HTTP 1.1 effectively) is a way to
fetch a "manifest" for the document, and then select and fetch the
document, (one or more) stylesheets, and so forth, in whateever order makes
most sense. In practice, probably style-sheet (if we don't already have
it), DTD (if we need it, and don't already have it), document, then
external entities. Now a CATALOG looks like a great nominee for the
manifest. So much so, that I'm very tempted to say that we _should_ require
catalogs, not for the PUBLIC->SYSTEM mapping, though that it a useful side
effect, but so that we can require that the standard URL for a "document"
may identifiy a CATALOG with a DOCUMENT entry, identifying where the XML
parsing should begin.

   So I vote for adding CATALOG as a requirement (if we use it as a
manifest), and even without that (if it's the only way we can get PUBLIC) I
don't mind if we suggest/require it as a default/fallback resolution
mechanism (though I am not convinced that we need one).

In any case, I think that a clear property that PUBLIC has that has not
been stated clearly before is that _any_ particular resolution mechanism is
allowed, if the application knows that a URL is a legitimate copy of a
PUBLIC ID'ed item.  In particular, implementors and authors should
understand that a PUBLIC is an explicit license to cache a resource and
access it any other time that its PUBLIC ID is found. We need to state this
quite explicitly, so that it is clear to authors that _if_ they use PUBLIC,
then they _must_ use a different PUBLIC ID for each non-equivalent version
of an entity -- and if they don't their documents may _not_ work, or will
not work _consistently_. This is the same kinf of commitment you make when
you advertise or use a URL -- you have made a commitment to keep that URL
valid, if you want your documents to work.

  -- David

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________