Re: What to do given both SYSTEM and PUBLIC? from Paul Prescod on 1997-02-19 (w3c-sgml-wg@w3.org from February 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Tue, 18 Feb 1997 22:47:21 -0500 (EST)
To: lee@sq.com
Cc: w3c-sgml-wg@w3.org
Message-Id: <199702190347.WAA22644@calum.csclub.uwaterloo.ca>
> > But if software knows about both, I would urge that the public
> > identifier should be tried first.
> > 
> > a) It will probably be faster (since it will be a catalog lookup more
> > often than a network transaction using a still-under-development URN
> > protocol).
> 
> No, the still-under-dev. URN protocol is for FPIs, as an alternative for
> CATALOG; the SYSTEM ID is a plain old URL or file.

I haven't seen anything to indicate that the URN protocol will be limited
to FPIs. I would presume it will support all public identifiers. 

But I think I wasn't clear in my sentence above. I was making the point
that for the near future, Public Identifier lookup should be as fast
as getting a catalog, which should be comparable in speed to retrieving 
a graphical "bullet". At some point in the future, we may implement some 
more complicated URN-based lookup system, and that may be slow. I was just
pointing out that we should not presume public identifier lookup will be
slow based on a hypothetical URn system.

What I didn't say and should have is that public identifiers should 
encourage many cache hits that would now be misses. This feature, together 
with the fast retrieval of catalogs, should speed up the resolution of 
objects.  More on both of these points below.

> You still have to fetch the remote CATALOG, which _then_ tells you how
> to use http (say) for the DTD.  So the SYSTEMlookup is likely to be
> much faster.  For performance, do SYSTEM first.  For generality, do
> PUBLIC first.

The browser can get the catalog immediately while downloading the document.
It might even get it before downloading the document. The server could also
send the catalog before the client asks for it. I believe HTTP 1.1 allows
the server to specify what the client will need and send it before the
client asks for it. Either way the overhead of getting a catalog is minimal.
It is especially minor if you don't set up a separate connection for it. 
I would hope that a separate connection for every file will go the way of 
the dodo very soon.

In fact, we should SERIOUSLY consider standardizing an archive format like
SDIF-for-the-Web of JAR-for-XML. I am really sick of having to manually
figure out what SGML entities, DTDs, and stylesheets go with a particular
SGML document. Let's correct that for XML. Wouldn't it be nice to be able
to take a single file, with all of the formatting, catalog and entities and
use it in Author/Editor/XML or AdeptXMLitor.

> > b) It provides opportunity for client-side overriding.
> Not unless you allow the user to override the CATALOG fetching.  I am
> assuming that a remote catalog is fetched by some unspecified mechanism
> (e.g. http) and that the *viewer* user cannot easily override it, short
> of actually editing a configuration file or something else equally
> unacceptably arcane to the average home owner :-)...

As long as we allow the user agent to choose its own resolution mechanism,
or override our default resolution mechanism, it may present a nice, GUI
user interface to the user for overriding public identifier lookups. Heck,
it could let the user reroute system identifier looks to an ODBC database
in Visual Basic. =) The important thing is that we provide the hook to
allow them to use their imagination in this regard.

> > c) It could allow "smarter" caching
> If two SYSTEM URls are the same, the existing URL manager will cache
> them, modulo pragma no-cache and the expiry time from HTTP.

Sure. But the example I gave is where they would not be the same. How many
copies of popular DTDs and stylesheets will be littered across the Web? Nobody
wants to depend on someone else's computer for their document to parse
correctly, but neither does it make sense to treat two different copies
of html.3.2.dtd differently. The public identifier gives a standard way for
tagging a document with a system independent name.

 Paul Prescod
Received on Tuesday, 18 February 1997 23:10:18 UTC