- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 20 Mar 97 12:35:42 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Thu, 20 Mar 1997 09:22:33 -0800 Terry Allen said: >I don't want XML processors to handle resolution at all, I want them >to send URLs to the systems generalized URL-fetching mechanism to >be resolved. I think what we are talking about is the process of >determining what it is that is to be resolved, which identifier >to choose ("how to manage indirection"). If a system offers that service, a processor can avail itself of it; I didn't mean to suggest the spec would or should require implementors of XML to re-implement a sockets and URL-fetching library ... > ... I'm still confused about the user and >the processor. > >What is the model of interaction between processor and application that >the SGML ERB is implicitly relying on? When I download an XML file The ERB is explicitly trying to stay away from prescribing an implementation model, so these questions are hard to answer. I think the answers to your questions are implicitly as follows, however (personal opinions only): >that contains a public identifier BEACH, > > - is it the a or the p that the file is directed to first? An implementation choice. It is certainly possible to imagine a situation in which P handles all the entities and passes a grove or some simple format (e.g. sgmls output format) to a downstream application; in this case, P sees the file and passes to A some representation of it. It's equally possible to imagine A being in the driver's seat, and A reading a file or data stream and periodically passing pieces of it to P saying "Here, parse this and give me a grove back". In this case, A sees the file first. > - if the a, does the a do something with BEACH before passing > the file to the p? Implementation dependent. > - is the string BEACH part of the p's output? If I read clause 4.3 correctly, the answer here is *always* 'yes', at least in the cases where 'BEACH' is the public identifier of an entity to which reference is made in the document. Actually, of course, 4.3 says this only of SYSTEM identifiers; I am assuming that rule 4 would be extended to say "For an external entity, the processor must inform the application of the entity's system identifier, if any, and public identifier, if any." If it's not so extended, then the answer to your question is 'Maybe; if P wants it to be, yes, otherwise no.' In some cases, P will have contracted with A to handle all entity resolution; if BEACH is part of a public identifier on an entity declaration, and the entity is referenced, then if I were writing A, I would probably be willing to settle for the contents of the entity so named. Tim Bray, on the other hand, building a full-text index, will want to see the identifier(s). In other cases, P will have promised A *not* to resolve entity references, and so A will need to have a way to ask P to resolve a reference at an appropriate time (e.g. when the user asks that this be done, or at load time if the publisher's policy so wills it and A agrees to follow the policy); if clause 4.3 is modified in what seems to me the natural way, then the way to do this is prescribed: P makes the string 'BEACH' available to A, probably along with the system identifier given, if any. Some parsers will probably also hand A the storage object identifier they generate from 'BEACH', but that's not written into 4.3 now. > - does the p resolve BEACH [or send BEACH out to a system utility > for resolution] as part of parsing? I'm in favor of defining this as a responsibility of P, rather than of A. But I don't think a decision has been made. And I'm not sure I want to say *when* this has to happen; I think initial-parse-time and link-traversal or entity-expansion time are both plausible. And nothing can stop A from doing what it likes with 'BEACH', including sending it to a public-id-to-URL server to see what comes back. > - does resolution mean converting BEACH to a URL, or (in the case of > PIs) converting it to a system id for comparison with cache contents? XML 1.0 defines system IDs as URLs, so I do not understand this question. > - does the p resolve BEACH as part of parsing only for certain > purposes described in the spec? If 'BEACH' is given as the public identifier of an entity, and the entity is referenced, then if P is responsible for expanding entity references, P is responsible for finding the data stream named by 'BEACH', at entity-reference-expansion time. Is this a trick question? > - does the p resolve BEACH only if it is used as a system or public ID > and not if it is the target of a link? I think XML 1.0 makes P responsible for expanding entity references, either a priori or on demand from A. But clause 4.3 rule 8 could also be read as allowing A to handle it, and does not explicitly say that P *must* provide the service on request, so perhaps 1.0 is underspecified here. Who translates public identifiers into system ids/urls, and when, is not yet specified, as far as I know. I've been assuming above that P handles it at least for entities referred to. That seems to suggest P could handle it for links, too. Whether to translate from public id to url at parse time or at link traversal time seems to me best left to P to work out. One could, I think, decide otherwise and say it's all left to A: P just has to pass the identifiers through, leaving A to work out its own salvation with catalog lookups, etc. I don't think this is logically untenable, but I do think it's a bad implementation strategy and a bad decision. If the author of A really wants to handle it all herself, then she can do so, by ignoring P's services and doing her own work on the public ids which P is required (can be required) to provide. > - if the p passes BEACH to the a, does the a resolve BEACH directly, > without bothering the p, or does the a tell the p, "Go to the BEACH"? If P is handling the TCP/IP port for A, A will ask P to go to the BEACH. If A controls the port, or chooses to use another port handler, P needn't be involved. I don't think XML 1.0 does or should constrain this; I think both possibilities are and should be legal. >| I'm also open to saying only that conforming processor must support the >| MRM and may support other methods, and how they decide which to use is >| to be decided by the designer, the implementor, the user, and anyone >| else who horns in on the discussion, but is not constrained by XML. >| (Sole difference: implementations are not required to provide a >| user-settable option saying "do it this way".) > >What are implementations? If they are applications, as a user I Processors. Sorry, I'm out of practice being disciplined in my terminology. >want one with all the knobs. If they are processors, under what >circs are they resolving these IDs (see list above)? I like knobs, too. But the WG and ERB may or may not feel that our personal preferences suffice as a reason to require that XML processors (and apps?) *always* provide *all* the knobs *as a condition of conformance*. Whether a knob should be required here is an open question; I'm agnostic (as I said), and you haven't actually said what you think XML should do, only what you hope apps will do. >| > ... >| >My point is that resolution (having power working) is not >| >indirection (choosing among PG&E, windmills, solar power, etc.), >| >and that any choice of method may result in failure of resolution. >| >| I think this is true, but so universally true that I'm not sure I >| can derive any consequences from it. No matter what resolution >| method we choose, it can fail. If we don't choose one at all, but >| leave the choice to implementors, it can still fail. ... > >So will it fail worse with or without specification? I think the >other interoperability issues (see list of questions above) need >answers, and those answers might inform the choice of what to do >about public identifiers. I don't think having or not having a required resolution method is likely to affect the frequency or severity of resolution failure, given that the publisher has made appropriate information available to enable resolution. So I still don't see that your point has a bearing on our decision, indubitable though it may be. Wait, hang on a moment. In practice, specifying a Minimum Required Method of public-id resolution will mean fewer failures, I think. Here's the logic: - some failures will be due to network outages, permissions problems, etc.; the frequency of these is unaffected by the MRM/no-MRM choice though it may be affected by the choice of a resolution mechanism (a mechanism that provides several levels of fallback is likely to fail less often than one with no fallbacks at all) - some failures will be due to unavailability of required information (e.g. failure to provide an appropriate SGML Open Catalog, failure to register public ID with the new Public-ID server, failure to install the correct version of the Sortes Vergilianae Name Resolver, ...); these will be more frequent if a publisher must provide the required information (always in effect a public:system map) in more than one form. So specifying a MRM reduces the failure rate. - some failures will be due to inaccuracy of required information (e.g. provision of an SGML Open Catalog with bad entries, registration of the wrong public ID with the new Public-ID server, installation of the Sortes Vergialinae Name Resolver with the wrong config file ..); these will be more frequent if a publisher must provide the required information in more than one form. So here too specifying a MRM reduces the failure rate. OK. I don't know if this is what you were driving at, but I've convinced myself that the second and third laws of thermodynamics do provide an argument for specifying an MRM. >| ... I think of producing a >| system identifier for a resource accessible to the processor as >| constituting resolution of the public identifier; am I misusing the >| term? > >I'm thinking of resolution as actually obtaining the thing identified. >Merely producing a system identifier doesn't guarantee success >(that is, it can leave the matter unresolved). And if anything on the >net is accessible to the processor if it wants to send a request for >a URL, production of a system identifier is only part of the process >(of course, if the thing is returned as a result of the request, >a system identifier can be produced for it). OK. I have been silently assuming that once P or A has a system id, issuing the network request for the actual resource is (a) not very hard and (b) equally likely to succeed or fail no matter what method we've used to identify the resource and/or translate from public id to system id. But if you want to reserve 'resolution' for actually coming up with the data in your input buffer, that's OK with me. >Fair enough; I would suggest that the spec be worded such that the >processor (if that's the piece involved) be allowed to hand off >indirection management to another component of the user's system >without loss of conformance. > >What matters is that indirection gets handled, not necessarily which piece >of the machinery handles it. To determine whether it is necessary >to specify which piece of the machinery does the work, you have to >articulate a model of the machinery or of its operations. Yes. The current draft spec tries to be both precise and general, in defining "the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application" -- and *not* in terms of who calls whom when. I think this leaves implementors free to organize the interaction of A and P however they wish, while still making clear what P has to do, when asked. I think we should strive for the same Who and What, not How and When, specificity in future revisions and in XML-Link. -C. M. Sperberg-McQueen
Received on Thursday, 20 March 1997 14:41:10 UTC