Re: XML catalog draft
[this is a long article because of the notes at the end on how Panorama
actually fetches SGML OPEN CATALOG files, and some (by no means all)
of the issues involved.
Paul Prescod wrote:
> The proposal leaves the resolution mechanism up to the application as it
No it shouldn't.
I want something that works. In the same way. Everywhere.
That is what we all need.
There is no point saying the market will produce lots of competing
mechanisms and the best one will win. They will all lose.
> > > Either way, some means of associating
> > > catalogues or ilinksets with documents is required.
> > Clearly -- otherwise we haven't solved the problem, but only made it
> > more complicated. A way of getting from instance to catalog is needed.
> I don't agree here. Catalogs are useful without a transmission mechanism.
I didn't say they weren't. Nor did Terry.
> If I send you a file with <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
> I have a feeling that your software will resolve it correctly.
And what if I put up on the Web an XML file using
PUBLIC "-//Liam Quin//DTD b359//EN"
which the draft allows.
How are you going to resolve = find = use the DTD?
If you say that's not specified, please start again.
> On the other hand! I think that a mechanism for associating catalogs with
> instances is useful, and important.
Which is what I said in my article, and all I said.
> I raised this in the catalog group, and
> we agreed that it was outside of our mandate and did not bother to discuss it
> further. Some of us felt that it was something that the ERB should
> add when they integrate the catalog proposal with the XML spec.
Then I hope the ERB sends you all back to the drawing board.
What is the catalog _for_? It is for turning a PUBLIC ID into a SYSTEM ID.
That is _all_ it is for. You now have to solve the problem of finding
the CATALOG in the first place, or you have not solved the required
If I say, I'd like a brandy but I don't have one, telling me, "you can get
brandy by pouring it out of a brandy bottle" doesn't help me very much;
it only frustrates me, because if I had a brandy bottle, I wouldn't have
said I didn't have any brandy.
If I say, I'd like a SYSTEM ID, telling me I can get one in a CATALOG file
that I don't have is similar, except I need a SYSTEM ID to get the CATALOG,
whereas I need money to get the brandy (usually -- anyone willing to swap
brandy for PUBLIC identifiers??)
> I planned to recommend a mechanism to the ERB independent from the catalog
> group, but could not decide between the Socat-way, with a file named
> "catalog" which is very convenient, but tromps on the user's filename
If SGML users are keeping files around called catalog.soc that are not
SGML OPEN catalogs, that's their problem.
> (perhaps less if the file was named xml-cat) or with a processing
> instruction, which is syntactically ugly and a little inconvenient to
> add to each document.
There are no files on the web -- a URL is not a filename. Now, they
usually map into filenames, but given the following URL to an XML file,
how do I get to the catalog?
hint: cgi-bin/documentation is a program, a CGI interface to a document
management system doing dynamic fragmentation of SGML/XML; I cannot store
CATALOG files in the database... (since they are not SGML) let's say.
This is a real, common example (but with a fake URL here!)
> > I will say right now that we spent a lot of effort on this topic for
> > SoftQuad Panorama, and didn't get it right in the 1st release.
> > It's still not perfect, but we have backward compatibility issues.
> > Let's do it right for XML.
> What is "right"? Your experience with this issue will be useful to us.
(1) allow links from the doc to the DTD directly (no catalog) even if
there is a PUBLIC ID (Pano does this -- you'll see why in a sec)
(2) allow a way of identifying a "base" URL of the current document, so
that relative paths can work in links & sys-ids. Allow this to be
prepended to the file with no other processing (it would come before
the <?-XML- ...?> header in this case, I expect) so that it can be
done by a non-XML-aware proxy server or very simple CGI script.
Panorama 2 uses a processing instruction for this -- we didn't have it
for Panorama 1, and this was a big problem, as you couldn't get
bookmarks and annotations working from GET-style search queries
(3) use the same mechanism to link style sheets to instances as you use
to link documents to instances. Panorama uses a separate file,
"entityrc", but I now wish that the information had all beein in
one place, e.g. "catalog".
If you use public identifiers to link to style sheets, you will need to
be able to give both a PUBLIC and a SYSTEM for the DTD as in (1), but
you will need the SYSTEM identifier to override the PUBLIC one in the
case where you don't actually have a catalog file.
(4) remember that you can't do file system probes. The original CATALOG
spec said that the filename for catalog was case insensitive. Originally,
because of its Windows heritage (despite the first version ("darc")
being on Unix!), Panorama looked for CATALOG on the remote server.
But more than half of all web servers are running Unix today, and
the path parts of URLS are case sensitive. We got so many support
calls about this that today Panorama looks first for "catalog" and
then for "CATALOG", but the failed probe does cause an obscure (to
the user) error message on many systems.
We never implemented the TR requirement of supporting Catalog, cAtalog,
and so forth, as each one would take a separate HTTP transaction...
This has been fixed (the TR was changed, as I recall), but it is
best if you never have to look and see if a URL works or not.
Sometimes, a URL probe might actually cost money -- e.g.if you're paying
for documents -- or might require a password, or might simply fail
silently with a zero-length "document" being returned, or a document
being returned saying "the URL you requested was not found; please
check your spelling..."!!!
(5) allow an instance to indicate that no CATALOG exists, and to give
all the information in some other way. Same for style sheet linkage,
whether using CATALOG or ENTITYRC (I hope not) or something else.
This need follows from a combination of the need to support database
queries and the inadvisability of trying to do something like
file system probes.
(6) you need to be able to associate multiple style sheets with each
instance (e.g. for printing), and possibly other things, such as
Java programs, active table of contents definitions, metadata,
collection information, location on navigational maps, and so forth.
Public identifiers can be used as part of this, as can processing
instructions. However you do it, it's essential that the same files
can be viewed locally with no web server and locally or remotely
using a web server, as otherwise it's imossible to test them without
putting them on a web server. The best way to do this is to treat
all system identifiers as partial URLs, relative to the file containing
them. This means that if you open
and it refers to
then an XML application ought tolook for
but if exactly the same unchanged bytestream had been downloaded as
http://www.sq.com/people/liam/ankle1.xml (this is a fake URL)
then the same XML application should resolve "walking.dtd" as
and if it had been ftp://.... then the same procedure should be used.
You have to consider what to do with a URL such as:
where presumably we should look for
and not try to apply both sets of MIME parameters.
(the ; is a preferred alternative to using & in queries, too)
Sorry, I have probably written too much already.
All of these issues need to be solved, or you won't end up with
interchangeable SGML on the Web. I know. I've been there.
I don't want to force the same solutions we used on people necessarily,
but the same issues do need to be addressed.