Re: Public Identifiers, and CATALOGS

At 7:22 PM -0800 3/31/97, Bill Smith wrote:
>David Durand wrote:
>> What we really want (if we are to use HTTP 1.1 effectively) is a way to
>> fetch a "manifest" for the document, and then select and fetch the
>> document, (one or more) stylesheets, and so forth, in whateever order makes
>> most sense. In practice, probably style-sheet (if we don't already have
>> it), DTD (if we need it, and don't already have it), document, then
>> external entities. Now a CATALOG looks like a great nominee for the
>> manifest. So much so, that I'm very tempted to say that we _should_ require
>> catalogs, not for the PUBLIC->SYSTEM mapping, though that it a useful side
>> effect, but so that we can require that the standard URL for a "document"
>> may identifiy a CATALOG with a DOCUMENT entry, identifying where the XML
>> parsing should begin.
>This sounds very much like exchange from IETF Mime/SGML days. While a
>proponent of exchange, I think this is too heavyweight for XML.

Well, I would agree, except that we need a way to attach stylesheets and
(possibly) other external resources to document entities. And we have this
ongoing "resolution" red-herring that can also be answered by the Exchange
proposal. _And_ using exchange does not have the bad effect of tying my
document instances (or the entities that contain them) to specific
processing specifications -- so it greatly enhances the modularity and
reusability of my documents. Nor does it tie my documents to a resoltion
method, as I don't _have_ to use catalogs if I can assume that the client
can find its own stylesheets or resolve it's own PIs (or if I don't care
about either of these).

Further, the only competing proposal is to add _more_ PIs to document
entities. The rules for these PIs are not as straightforward as everyone
makes out: What happens if I want to treat an entitiy sometimes as a
well-formed document and sometimes a well-formed document part? I have to
have several occurences of the Stylesheet PI in the document (each in
different entities, of course), and some complex set of rules as to _which_
default stylesheet is the _real_ one, and also rules as to _when_ such
duplication is legal.

So now CATALOG and Exchange doesn't really look so hard, since it solves
more than one problem, more elegantly than the competing proposals, and
there are no problematic "special cases" where we have to define
non-obvious behavior.

>I've watched the debate and still don't understand the need to add PUBLIC
>to the current draft. We have external entities whose declarations specify
>URLs "which may be used to retrieve the content of the entity".

My note pointed out that even without a resolution mechanism for PUBLIC IDs
the fact that they are _defined_ as names for unique entities allows
smarter caching behavior. This is because knowing that something is a name
is useful information.

And of course it's not at all unhelpful to be able to give a name, and a
default location in case the client cannot process the name. This last is
what allows non-resolving clients to get benefit from PUBLIC. I can see the
pragmatic reason to specify URLs, but I see real advantages to having a
location-independent namespace. I think being able to specify PUBLIC and
SYSTEM both gives us this with very little trouble. We're about to add
CONCUR to XML, and people are complaining about the burden of implementing
PUBLIC!???!! Give me a break.

I love CONCUR, and I think that non-hierarchical markup is _really
important_ in the long term, but it seems really silly to put it in XML
right now... especially if we are taking out features that are in wide and
productive use. (i.e. PUBLIC and SDATA). CONCUR may be useful, but it is
not much _used_.

>To my knowledge, there is nothing preventing something like:
>  fpi:<fpi specific part>

One problem is that we would have to submit a _resolution_ protocol to the
IESG, thus defeating the _purpose_ of FPIs. We could use URNs, but since
the draft specified URL and not URI, that is only open to those who don't
care about conformance -- and in that case, why not implement PUBLIC and
get more SGML compatibility coupled with more functionality?

>This is valid URL syntax and can be used by new applications as a means to
>perfrom fpi resolution. Some applications (like today's browsers) will not
>understand the fpi scheme and will fail. At some point in the future,
>other applications will understand the fpi scheme and will perform resolution
>based on some, as-yet-undefined mechanism. What's wrong with this picture?
>It requires *no* modification to the current XML draft, is web-compliant,
>is easy to understand, does not require catalogs, and in short keeps XML

One problem is that it will be _only_ one mechanism once it's defined. That
violates one of the desired properties of FPIs. (resolution-mechanism

>While not ideal, we get all the benefit of FPIs on the publishing side,
>and *some* of the benefit on the browsing side. With XML browsers, we
>would use:

We don't get a fallback SYSTEM ID. When I suggested this previously (about
4 months ago), I pointed out that to make it equivalent, we'd have to
change SYSTEM to a _list_ of alternative URIs. This gives us all the
advantages you are claiming, and would be acceptable to me....

But it is much less obvious on the face of it than PUBLIC.

On the other hand, allowing multiple URIs would remove the minimum literal
character set problems Terry has been complaining about.

It's a reasonable option.

>(Apologies to anyone who proposed fpi:<fpi specific part> in the past. I'm
>sure it has been mentioned as an obvious way to allow for fpi resolution.)

No problem. If it solved the problem by iteself, it would be a fine solution.

   -- David

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________