- From: <lee@sq.com>
- Date: Fri, 7 Feb 97 17:23:17 EST
- To: w3c-sgml-wg@w3.org
[this is a long article because of the notes at the end on how Panorama actually fetches SGML OPEN CATALOG files, and some (by no means all) of the issues involved. Lee ] Paul Prescod wrote: > The proposal leaves the resolution mechanism up to the application as it > should. No it shouldn't. I want something that works. In the same way. Everywhere. That is what we all need. There is no point saying the market will produce lots of competing mechanisms and the best one will win. They will all lose. [...] > > > Either way, some means of associating > > > catalogues or ilinksets with documents is required. > > Clearly -- otherwise we haven't solved the problem, but only made it > > more complicated. A way of getting from instance to catalog is needed. > > I don't agree here. Catalogs are useful without a transmission mechanism. I didn't say they weren't. Nor did Terry. > If I send you a file with <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> > I have a feeling that your software will resolve it correctly. Wrong. And what if I put up on the Web an XML file using PUBLIC "-//Liam Quin//DTD b359//EN" which the draft allows. How are you going to resolve = find = use the DTD? If you say that's not specified, please start again. > On the other hand! I think that a mechanism for associating catalogs with > instances is useful, and important. Which is what I said in my article, and all I said. > I raised this in the catalog group, and > we agreed that it was outside of our mandate and did not bother to discuss it > further. Some of us felt that it was something that the ERB should > add when they integrate the catalog proposal with the XML spec. Then I hope the ERB sends you all back to the drawing board. What is the catalog _for_? It is for turning a PUBLIC ID into a SYSTEM ID. That is _all_ it is for. You now have to solve the problem of finding the CATALOG in the first place, or you have not solved the required problem. If I say, I'd like a brandy but I don't have one, telling me, "you can get brandy by pouring it out of a brandy bottle" doesn't help me very much; it only frustrates me, because if I had a brandy bottle, I wouldn't have said I didn't have any brandy. If I say, I'd like a SYSTEM ID, telling me I can get one in a CATALOG file that I don't have is similar, except I need a SYSTEM ID to get the CATALOG, whereas I need money to get the brandy (usually -- anyone willing to swap brandy for PUBLIC identifiers??) > I planned to recommend a mechanism to the ERB independent from the catalog > group, but could not decide between the Socat-way, with a file named > "catalog" which is very convenient, but tromps on the user's filename > space, If SGML users are keeping files around called catalog.soc that are not SGML OPEN catalogs, that's their problem. > (perhaps less if the file was named xml-cat) or with a processing > instruction, which is syntactically ugly and a little inconvenient to > add to each document. There are no files on the web -- a URL is not a filename. Now, they usually map into filenames, but given the following URL to an XML file, how do I get to the catalog? http://some.where.com/cgi-bin/documentation/bk12/ch3;level=novice hint: cgi-bin/documentation is a program, a CGI interface to a document management system doing dynamic fragmentation of SGML/XML; I cannot store CATALOG files in the database... (since they are not SGML) let's say. This is a real, common example (but with a fake URL here!) > > I will say right now that we spent a lot of effort on this topic for > > SoftQuad Panorama, and didn't get it right in the 1st release. > > > > It's still not perfect, but we have backward compatibility issues. > > Let's do it right for XML. > > What is "right"? Your experience with this issue will be useful to us. (1) allow links from the doc to the DTD directly (no catalog) even if there is a PUBLIC ID (Pano does this -- you'll see why in a sec) (2) allow a way of identifying a "base" URL of the current document, so that relative paths can work in links & sys-ids. Allow this to be prepended to the file with no other processing (it would come before the <?-XML- ...?> header in this case, I expect) so that it can be done by a non-XML-aware proxy server or very simple CGI script. Panorama 2 uses a processing instruction for this -- we didn't have it for Panorama 1, and this was a big problem, as you couldn't get bookmarks and annotations working from GET-style search queries without it. (3) use the same mechanism to link style sheets to instances as you use to link documents to instances. Panorama uses a separate file, "entityrc", but I now wish that the information had all beein in one place, e.g. "catalog". If you use public identifiers to link to style sheets, you will need to be able to give both a PUBLIC and a SYSTEM for the DTD as in (1), but you will need the SYSTEM identifier to override the PUBLIC one in the case where you don't actually have a catalog file. (4) remember that you can't do file system probes. The original CATALOG spec said that the filename for catalog was case insensitive. Originally, because of its Windows heritage (despite the first version ("darc") being on Unix!), Panorama looked for CATALOG on the remote server. But more than half of all web servers are running Unix today, and the path parts of URLS are case sensitive. We got so many support calls about this that today Panorama looks first for "catalog" and then for "CATALOG", but the failed probe does cause an obscure (to the user) error message on many systems. We never implemented the TR requirement of supporting Catalog, cAtalog, and so forth, as each one would take a separate HTTP transaction... This has been fixed (the TR was changed, as I recall), but it is best if you never have to look and see if a URL works or not. Sometimes, a URL probe might actually cost money -- e.g.if you're paying for documents -- or might require a password, or might simply fail silently with a zero-length "document" being returned, or a document being returned saying "the URL you requested was not found; please check your spelling..."!!! (5) allow an instance to indicate that no CATALOG exists, and to give all the information in some other way. Same for style sheet linkage, whether using CATALOG or ENTITYRC (I hope not) or something else. This need follows from a combination of the need to support database queries and the inadvisability of trying to do something like file system probes. (6) you need to be able to associate multiple style sheets with each instance (e.g. for printing), and possibly other things, such as Java programs, active table of contents definitions, metadata, collection information, location on navigational maps, and so forth. Public identifiers can be used as part of this, as can processing instructions. However you do it, it's essential that the same files can be viewed locally with no web server and locally or remotely using a web server, as otherwise it's imossible to test them without putting them on a web server. The best way to do this is to treat all system identifiers as partial URLs, relative to the file containing them. This means that if you open /users/liam/docs/barefoot/ankle1.xml and it refers to SYSTEM "walking.dtd" then an XML application ought tolook for /users/liam/docs/barefoot/walking.dtd but if exactly the same unchanged bytestream had been downloaded as http://www.sq.com/people/liam/ankle1.xml (this is a fake URL) then the same XML application should resolve "walking.dtd" as http:://www.sq.com/people/liam/walking.dtd and if it had been ftp://.... then the same procedure should be used. You have to consider what to do with a URL such as: http://...../ankle1.xml;version=3 and SYSTEM "walking.dtd;version=2" where presumably we should look for http://...../walking.dtd;version=2 and not try to apply both sets of MIME parameters. (the ; is a preferred alternative to using & in queries, too) Sorry, I have probably written too much already. All of these issues need to be solved, or you won't end up with interchangeable SGML on the Web. I know. I've been there. I don't want to force the same solutions we used on people necessarily, but the same issues do need to be addressed. Lee
Received on Friday, 7 February 1997 17:23:28 UTC