Re: Public Identifiers, and CATALOGS
I think I agree with much of this. One point to keep in mind with this, is
that nobody on the Web except for us SGML weenies has the vaguest idea what a
Public Identifier is and will believe whatever we tell them, and almost
everyone who will use them at first is us.
I'll make two assumptions:
1) a public identifier is immutable, and a client can ultimately use any
resolution method it wants.
2) if I send you a document, I should be able to supply you with sufficient
extra information for you to be able to process that document.
The second point says I need to know how to resolve the public identifier, but
the first says you don't need to ask me. I think this is fair. I also don't
think this should be a catalog, not because I don't like catalogs, but because
they are extraneous to the document - I want the name resolved, not several K
of catalog information. In any case, I'd prefer to see a catalog as a name
server you can interface with. So I think the server should include a pragma,
disguised as a processing instruction, eg.,
<?XML-public-identifier type="resolver" href="pubid/cgi"?>.
The enclosed URL will accept one or more pubid's (as text strings) and return
one or more entity definitions.
I understand that adding PIs is generally a bad thing, but I think it's
excusable in this case. I also allows client and server to do the actual name
resolution anyway they see fit, but with a narrow interface between them if
they need to talk. And it puts the onus of resolution, in the end, with the
server, where I think it belongs.
Again, I'd like to point out that the only people who will use this initially,
and who will implement it, are the SGML members of the crew, who will make the
effort to build what is not a particularly complicated CGI script.
On Mar 31, 4:38pm, David Durand wrote:
> Subject: Public Identifiers, and CATALOGS
> The issue of naming versus resolution is still coming up, in
> misunderstood form. For example, Lee's 6 equivalent mechanisms did not in
> fact offer equivalent "naming" behavior, though they do offer equivalent
> _resolution_ behavior. I will try to point out an effective operational
> property of names that no resolution mechanism can provide. I also point
> out that CATALOGS are the best solution to the stylesheet problem. In fact
> a better solution to that, than to the name resolution problem!
> The most important fact about PUBLIC IDs is that they define equivalence
> classes of documents. In order for these classes to make sense, we need to
> ensure global uniqueness of names. While this seems hard, in practice using
> hierarchical namespace delegation, and simply encouraging long identifiers
> gets us 99% of the way there. Even unregistered FPIs have turned out the be
> pretty robustly unique in practice.
> This is a social contract, but it's in fact less-onerous than the social
> contract implied by a URL -- which is that the owner will maintain an HTTP
> server, at a specified DNS location, with a specified resource on that
> server. People fail this contract all the time, of course, because to meet
> it requires money.
> Any time I see a PUBLIC ID that I've _ever_ seen before, it means that I
> am justfied in re-using the results of my last resolution. So, PUBLIC IDs
> are like a super-distributed cache. A browser using FPIs (even without ever
> resolving one) can cache the first copy of the TEI DTD it ever sees, based
> on the SYSTEM ID provided along with thar PUBLIC ID, and then _never fetch
> it again_.
> Further, even thought FPIs (and all names) may become broken, a broken
> FPI is actually more informative than a broken URL, as it generally
> contains a lot of human-readable information that may help someone who
> absolutely _must_ resolve it. And anyway, it's certainly no _more_ broken
> than a broken URL, and somehow people seem to be able to live with a lot of
> The forgoing is just to note that the objection that an unbreakable URN/FPI
> service is required for useful naming is just as bogus as the claim that a
> foolproof location mechanism is required for URLs to be useful. In fact,
> the biggest insight that TBL had (in my opinion) is that the academic
> hypertext community's insistence on consistent linkbases (unbreakable
> links) was a demand for an unnecessary perfection. We all dismissed the WWW
> as (possibly) interesting but obviously broken... and failed to see that it
> was damned useful, even if it was "broken". FPIs are also damned useful,
> even if we don't have _one way_ to resolve them.
> Now, as to CATALOGs per se, I'm agnostic. They are certainly a
> reasonable mechanism for resolution, and with delegate, they are a
> plausible draft of a distributed resolution mechanism.
> So I don't object to them. I actaully rather favor them for another
> reason: we are heading down the road to including PIs to locate style
> sheets, and I don't like that for a number of reasons:
> 1. I _don't_ want to be forced to put processing information in my
> 2. I think it's a bad idea to _ever_ put processing information in
> 3. Practically, even with HTTP 1.1, it's not an effective implementation
> 1, and 2 are very controversial, and I'll state them without further
> discussion. 3, however, has been bothering me for a while.
> HTTP 1.1 fixes the multiple connection problems of HTTP 1.0 and allows
> several commands to be bouncing down the wire at a time (which is very
> good). _but_ it is not a multiplexed connection -- one whole resource must
> be transmitted before the next one can be fetched. This means that once I
> start fetching the document, I can start fetching the stylesheet, and the
> catalog, but I can't use either of them until the document is all there.
> What we really want (if we are to use HTTP 1.1 effectively) is a way to
> fetch a "manifest" for the document, and then select and fetch the
> document, (one or more) stylesheets, and so forth, in whateever order makes
> most sense. In practice, probably style-sheet (if we don't already have
> it), DTD (if we need it, and don't already have it), document, then
> external entities. Now a CATALOG looks like a great nominee for the
> manifest. So much so, that I'm very tempted to say that we _should_ require
> catalogs, not for the PUBLIC->SYSTEM mapping, though that it a useful side
> effect, but so that we can require that the standard URL for a "document"
> may identifiy a CATALOG with a DOCUMENT entry, identifying where the XML
> parsing should begin.
> So I vote for adding CATALOG as a requirement (if we use it as a
> manifest), and even without that (if it's the only way we can get PUBLIC) I
> don't mind if we suggest/require it as a default/fallback resolution
> mechanism (though I am not convinced that we need one).
> In any case, I think that a clear property that PUBLIC has that has not
> been stated clearly before is that _any_ particular resolution mechanism is
> allowed, if the application knows that a URL is a legitimate copy of a
> PUBLIC ID'ed item. In particular, implementors and authors should
> understand that a PUBLIC is an explicit license to cache a resource and
> access it any other time that its PUBLIC ID is found. We need to state this
> quite explicitly, so that it is clear to authors that _if_ they use PUBLIC,
> then they _must_ use a different PUBLIC ID for each non-equivalent version
> of an entity -- and if they don't their documents may _not_ work, or will
> not work _consistently_. This is the same kinf of commitment you make when
> you advertise or use a URL -- you have made a commitment to keep that URL
> valid, if you want your documents to work.
> -- David
> David Durand firstname.lastname@example.org \ david@dynamicDiagrams.com
> Boston University Computer Science \ Sr. Analyst
> http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
> --------------------------------------------\ http://dynamicDiagrams.com/
> MAPA: mapping for the WWW \__________________________
>-- End of excerpt from David Durand