Re: Public Identifiers, and CATALOGS from Paul Prescod on 1997-04-02 (w3c-sgml-wg@w3.org from April 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Tue, 01 Apr 1997 23:18:56 -0500
To: w3c-sgml-wg@w3.org
Message-ID: <3341DE30.61E5@csclub.uwaterloo.ca>
lee@sq.com wrote:
>People quickly realised that
> you can start a URN with urn: and have it work.  

Every existing browser in the world will return "protocol not found"
unless they are connected to a special proxy. If we could get people to
install special proxies all at once, we could probably get them to all
upgrade their browsers at once.

I don't know how to unify these two statements:

> I'd rather allow a list of URLs in a SYSTEM identifier to accomplish
> this.  Note that space is not allowed in URLs.
> 
> However, if you find yourself giving multiple URLs to access the
> same thing, and expecting software to try them one at a time until one
> "works", you're asking for trouble.

Please clarify. Are multiple URLs a good idea as long as you don't abuse
them? And what does this have to do with public identifiers? Did you
mean to say that you would rather allow a list of URLs and public
identifiers? Seems like over-engineering to me: just put back in the
PUBLIC keyword from SGML and be done with it.
 
> This is not to say that indirectioon shuld not be supported, but that
> if you provide a fallback URL, it will be the one you could have provided
> in the first place, and fetching the document would have been faster
> (no need to connect for CATALOG).

David has described how getting the catalog can speed up your download
time.
 
> If all XML clients use CATALOG in the same way, the chances are high that
> if you publish a document, you'll put a CATALOG file there that everyone
> else's application can read.  In that case, no fallback URL is needed.

Agreed.

> If some XML clients use "CATALOG", some use "catalog", some use "Catalog",
> and some use whois++ and/or URN resolution instead, you'll need to put
> several different kinds of catalog file on your server along with your
> document, and also make a whois++ entry, and perhaps do other things
> as well, as yet undreamt of, if you want to reach a wide audience.

You just do the things you need to reach the audience you need. If that
audience includes the entire Internet and the only way to do that is to
hardcode a System identifier beside your public identifier, then you can
do that.
 
> Somone (I forget who) said that at first, only SGML impementors will be
> using XML.  I hope that's false.  If it isn't, we didn't need to change
> SGML: as Peter and James and others who have written SGML parsers will
> I am sure agree, the goal is to make something that a CS grad with little
> or no SGML background but some basic web and HTML knowledge will look
> at and want to use and be able to implement quickly.

Thus the "at first." I think that that "prediction" has already been
proven true: the existing implementations of XML are all from people in
the SGML community (depending on how you classify Microsoft). When XML
is finished 3 months from now I expect the same people will be the first
implementors.

> If new methods come along, the XML spec can be revised.  That is why
> there is provision for a version ID in the header.

You keep saying this, but the point is that I do not want to go through
my documents and change all of my URLs. No, I cannot automate this
because two different logical entities can temporarily share the same
URL. Only by "tagging" the logical entities explicitly, as I would tag
content explicitly, can I be confident that my transformation is
accurate. That is why a public identifier, even one right beside the
system identifier, is valuable.
 
> In that case, I hope we don't have to admit in public that CATALOG is
> an ascii non-XML non-SGML file because the SGML vendors said it would
> be too hard to implement if it was in SGML.  Ooops.

There is a difference between "too hard to implement" and "not worth the
extra typing for a purely political point." But I agree that we should
at least slap a DOCTYPE on in the same way that we do for simple DSSSL
stylesheets so that the document type is self-describing.
 
> Or worse, because if the PUBLIC ID works, you don't need to test the
> fallback.  

Sounds like an opportunity for XML tools vendors. The exact same problem
exists today with URLs that work on your machine but not on the server
for any of a variety of reasons.

 Paul Prescod
Received on Tuesday, 1 April 1997 23:13:08 UTC