[Prev][Next][Index][Thread]

Re: XML catalog draft



At 2:43 PM 2/11/97, Murray Altheim wrote:
>dgd@cs.bu.edu (David Durand) writes:
>>Since PUBLIC is likely to be a point of user-tailorability, it should be
>>looked at first -- implementations that don't implement PUBLIC resolution
>>will simply ignore the PUBLIC, thus causing it to "fail". I can't think of
>>a case where someone who _has_ working public resolution, would prefer to
>>use the system ID -- andif they did, it seems they could always ensure that
>>any given public ID (or all) would fail to resolve.
>
>Actually, that's the opposite situation in my experience:
>
>    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"
>                          "http://www.cm.spyglass.com/dtd/html.dtd">
>
>This announces to the world that the document conforms to HTML 2.0, but
>tells the processor that a local copy of 'html.dtd' will provide the
>resource without resorting to a PUBLIC catalog lookup. IOW, why bother
>resolving the reference if the document seems to know where to look. Then,
>if the SYSTEM fails, resort to the more generalized process of a catalog
>lookup using PUBLIC.

This is correct for a local file system, but incorrect, I think, when
SYSTEM IDs are URLs. A URL in general requires the invocation of a network
query, whereas a catalog may be resolvable with purely local (and hence
cheap) operations. In your example, I can, without a cache, determine from
the PUBLIC ID that I have a local file containing the correct DTD. The
SYSTEM ID just tells me that your HTTP server also has a copy in that
place. Of course a local URL cache could invert this preference...

Perhaps we should leave the resolution strategy to the client, as I can
imagine rather complex, but very sensible strategies. For instance:

  Prefer local SYSTEM IDs to any CATALOG method
  Otherwise, use the local CATALOG to resolve by PUBLIC ID
  Otherwise try any (non-local) SYSTEM ID
  Finally, try non-local CATALOG resolution (perhaps extended SOCAT).

The above strategy may actually be quite good for a typical browser
application that. In fact each of the SYSTEM resolutions could be further
modified by a SYSTEM ID cache check, which should precede any
network-invoking operations...

Depending on how  the software caches data, the optimal strategy may be
very complex. I'm now thinking we should not get in the way of processing
agents defining their own strategies _however_ they want to.


   -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________