Re: Public Identifiers, and CATALOGS from lee@sq.com on 1997-04-02 (w3c-sgml-wg@w3.org from April 1997)

From: <lee@sq.com>
Date: Tue, 1 Apr 97 19:13:27 EST
To: w3c-sgml-wg@w3.org
Message-Id: <9704020013.AA25541@sqrex.sq.com>
Bill Smith's comments are very much to the point, I think.

He wrote:

> The scheme specific part of a URL can be a name. To my knowledge nothing
> precludes defining the scheme specific part of a URL as a name. An equally
> intelligent cache can be constructed using fpi:<fpi specific part>.

and that's correct.  Of course, one can also use urn: there too.
There was provision early on in HTML for <A> to take a URN attribute
as well as a URL; if that's still in the DTD, it's really there for
backwards compatibility, I would guess.  People quickly realised that
you can start a URN with urn: and have it work.  You can even do that
with existing browsers, if they are configured to use a proxy server --
I just tried with Netscape, for example, which knows to pass urn: URLs
on to a proxy server (which in our case then doesn't know how to handle
them yet, of course).

> > And of course it's not at all unhelpful to be able to give a name, and a
> > default location in case the client cannot process the name.

I'd rather allow a list of URLs in a SYSTEM identifier to accomplish
this.  Note that space is not allowed in URLs.

However, if you find yourself giving multiple URLs to access the
same thing, and expecting software to try them one at a time until one
"works", you're asking for trouble.

For ftp: URLs, a message saying "too many people logged in" is often
indistinguishable from success, for example...  and many HTTP servers
now return a default document if you ask for a URL that isn't there.
At least some of them also give the HTTP 404 error (NOT FOUND) in that
case.  But at least some servers also have a built-in time delay for
that case, so as to avoid the case where errant robots bring a server
to its knees.

Netscape doesn't tell helper apps when a download failed -- for Panorama,
we have a timeout -- if we've asked more than 40 times for a URL and still
not got it, or it's taken more than 6 minutes, chances are that Netscape
is sitting there displaying an error message, or silently failed.
So you <blink>_*_*_*_*_really_*_*_*_*_</blink> want to avoid this.

Finally, it's not necessary to have "if this doesn't work, try _this_"
in situations where the pubisher _does_ have control.

If you put an XML file up on the web, you know its URL.

This is not to say that indirectioon shuld not be supported, but that
if you provide a fallback URL, it will be the one you could have provided
in the first place, and fetching the document would have been faster
(no need to connect for CATALOG).

If all XML clients use CATALOG in the same way, the chances are high that
if you publish a document, you'll put a CATALOG file there that everyone
else's application can read.  In that case, no fallback URL is needed.

If some XML clients use "CATALOG", some use "catalog", some use "Catalog",
and some use whois++ and/or URN resolution instead, you'll need to put
several different kinds of catalog file on your server along with your
document, and also make a whois++ entry, and perhaps do other things
as well, as yet undreamt of, if you want to reach a wide audience.

Somone (I forget who) said that at first, only SGML impementors will be
using XML.  I hope that's false.  If it isn't, we didn't need to change
SGML: as Peter and James and others who have written SGML parsers will
I am sure agree, the goal is to make something that a CS grad with little
or no SGML background but some basic web and HTML knowledge will look
at and want to use and be able to implement quickly.

So it's NOT okay to allow PUBLIC without specifying _exactly_ what other
files to fetch to find out how to look up the PUBLIC identifier to
find out what URL to try to fetch.

If new methods come along, the XML spec can be revised.  That is why
there is provision for a version ID in the header.

Bill wrote:
> We only need involve the IESG if we expet FPIs to interoperate. This is a
> red herring since (as best I can tell) the list has reached consensus on
> resolution interoperability - there won't be any. 

I don't think we've reached consensus.  If we do, I hope it will be to
leave out PUBLIC, but if not I think we have to have CATALOG.

In that case, I hope we don't have to admit in public that CATALOG is
an ascii non-XML non-SGML file because the SGML vendors said it would
be too hard to implement if it was in SGML.  Ooops.

But suppose that we go with a CATALOG in XML.

Paul wrote:
>> One problem is that it will be _only_ one mechanism once it's defined.
>> That violates one of the desired properties of FPIs. (resolution-mechanism
>> independence)
In fact, this is not the case.  The actual resolution is done at the URL
level.  So if you discover a URN scheme, you can have CATALOG and put in
it entries like
    PUBLIC "-//something//JP" "urn:-//something//JP"
if you like.  If you're willing to do the http fetch of catalog, another
layer of indirection is probably OK until XML is revised.

Bill said:
> One resolution mechanism that works for (one or more) location-independent
> namespaces would be preferable to many abstract resolution mechanisms. For
> 10 years we've had PUBLIC and yet we still don't have a single mechanism
> that interoperates. I'd vote for one concrete mechanism as opposed to an
> infinity of abstract ones. XML should be concrete, SGML is abstract.
Yes, I agree.

> The rest of us will be left with "404 Not Found" because (I suspect)
> fallback XML objects will be maintained about as well as HTML pages.

Or worse, because if the PUBLIC ID works, you don't need to test the
fallback.  And there's always the malicious user, as per Ken Holman's
comment about using "", but I'm not sure it's necessary to worry about that.

> I'm not convinced PUBLIC is required and suspect that it will be (for
> the most part) ignored. If we persist in adding features that will
> be ignored, we can predict the fate of XML.
Agreed.  Very much so.

I think maybe some people are underestimating the effect of the first
dozen or so "file not found" messages from an XML application.

Yes, you can ignore the DTD itself.  What about declared entities?
What about image files?  You don't see inline images in this document
because you and Iused different browsers?  Come on, folks!  The Web
continues to grow because of compatibility.  Use any browser.  On any
platform.  Yes, there are mistakes -- like ActiveX :-) -- but basic
HTML interoperabiity is there now today.

Let's not do worse.

Lee
Received on Tuesday, 1 April 1997 19:13:23 UTC