- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Tue, 18 Feb 1997 22:47:21 -0500 (EST)
- To: lee@sq.com
- Cc: w3c-sgml-wg@w3.org
> > But if software knows about both, I would urge that the public > > identifier should be tried first. > > > > a) It will probably be faster (since it will be a catalog lookup more > > often than a network transaction using a still-under-development URN > > protocol). > > No, the still-under-dev. URN protocol is for FPIs, as an alternative for > CATALOG; the SYSTEM ID is a plain old URL or file. I haven't seen anything to indicate that the URN protocol will be limited to FPIs. I would presume it will support all public identifiers. But I think I wasn't clear in my sentence above. I was making the point that for the near future, Public Identifier lookup should be as fast as getting a catalog, which should be comparable in speed to retrieving a graphical "bullet". At some point in the future, we may implement some more complicated URN-based lookup system, and that may be slow. I was just pointing out that we should not presume public identifier lookup will be slow based on a hypothetical URn system. What I didn't say and should have is that public identifiers should encourage many cache hits that would now be misses. This feature, together with the fast retrieval of catalogs, should speed up the resolution of objects. More on both of these points below. > You still have to fetch the remote CATALOG, which _then_ tells you how > to use http (say) for the DTD. So the SYSTEMlookup is likely to be > much faster. For performance, do SYSTEM first. For generality, do > PUBLIC first. The browser can get the catalog immediately while downloading the document. It might even get it before downloading the document. The server could also send the catalog before the client asks for it. I believe HTTP 1.1 allows the server to specify what the client will need and send it before the client asks for it. Either way the overhead of getting a catalog is minimal. It is especially minor if you don't set up a separate connection for it. I would hope that a separate connection for every file will go the way of the dodo very soon. In fact, we should SERIOUSLY consider standardizing an archive format like SDIF-for-the-Web of JAR-for-XML. I am really sick of having to manually figure out what SGML entities, DTDs, and stylesheets go with a particular SGML document. Let's correct that for XML. Wouldn't it be nice to be able to take a single file, with all of the formatting, catalog and entities and use it in Author/Editor/XML or AdeptXMLitor. > > b) It provides opportunity for client-side overriding. > Not unless you allow the user to override the CATALOG fetching. I am > assuming that a remote catalog is fetched by some unspecified mechanism > (e.g. http) and that the *viewer* user cannot easily override it, short > of actually editing a configuration file or something else equally > unacceptably arcane to the average home owner :-)... As long as we allow the user agent to choose its own resolution mechanism, or override our default resolution mechanism, it may present a nice, GUI user interface to the user for overriding public identifier lookups. Heck, it could let the user reroute system identifier looks to an ODBC database in Visual Basic. =) The important thing is that we provide the hook to allow them to use their imagination in this regard. > > c) It could allow "smarter" caching > If two SYSTEM URls are the same, the existing URL manager will cache > them, modulo pragma no-cache and the expiry time from HTTP. Sure. But the example I gave is where they would not be the same. How many copies of popular DTDs and stylesheets will be littered across the Web? Nobody wants to depend on someone else's computer for their document to parse correctly, but neither does it make sense to treat two different copies of html.3.2.dtd differently. The public identifier gives a standard way for tagging a document with a system independent name. Paul Prescod
Received on Tuesday, 18 February 1997 23:10:18 UTC