- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Sun, 09 Feb 1997 09:50:32 -0500
- To: w3c-sgml-wg@w3.org
lee@sq.com wrote: > > Similarly, SGML did not specify a special syntax for system identifiers or > > a resolution mechanism for them. Thanks to that "omission" we can now use URLs > > in XML. Once again, that was a Good Decision. > That "omission" has been corrected, and SGML now has FSIs, because of > problems with SGML systems not being sufficiently interoperable in practice. I don't think that FSIs meet your criteria of being usable and reliable in the same way everywhere. They seem to me to be just a syntax for saying: "Here's something you may or may not understand. It has notation FOOBAR, so if you know how to deal with that, you know how to deal with this." So they are still open-ended in the way I am arguing public identifiers should be. But I'm certainly happy to be educated if I'm wrong. > I think that again we are at cross-purposes. I am not proposing mandating > a single resolution mechanism. I am saying tht the proposal has to work, > in the sense that its proponents have to show how you can deliver XML > on the web (our Purpose) using it. What I am arguing for, primarily is a separation of baby and bathwater. We can add a default catalog resolution mechanism. I voted for that. But some may still vote against it. Jon is voting for a powerful and complex delegation mechanism. I voted against that. James wants us to figure out the interoperation with URNs and FSIs and everything else. The baby is the simple, opaque PUBLIC string that SGML people use to good effect today. The bathwater is the resolution argument which may or may not ever be resolved. If we can suggest a solution to the latter, great! Let's do it. I'm just arguing that we needent tie the former to the latter. What about interoperability? Interoperability is primarily what we came here for, but I don't think it is the *only* thing we came here for. For instance, we decided to let XML have processing instructions, despite the fact that we know that they can cause problems. But we decided that in the tug of war between a) the private interoperability cost and b) the private "usefullness" benefit, they were worth keeping. We made the same choice with character sets (though you can imagine the potential interoperability costs there are huge). What about the fact that you can use arbitrary URL schemes, even some that have not been invented? Or link to arbitrary binary objects that some clients may not know how to render? In fact, we made the same choice with DTDs. If we wanted to take interoperability to the extreme, we could fix the tag set available in XML. You and I know that multiple tag sets are a MASSIVE source of interoperability problems. But they are also a massive source of POWER and usefullness, and that's why we're here. On the other hand, we went the opposite way on some choices. For instance requiring a concrete delimiter set is a little less useful than allowing an abstract set, but it is much more interoperable. Back to the point: the baby is the string "PUBIC foobarbaz". By itself, it can't hurt anything. Used in conjunction with a system identifier, it can be skipped with a call to a single C function, and totally ignored. If we decide that we cannot work out the interoperability problems today, then we should not decide that we must remove the syntactic feature that a) is useful in legacy systems and systems under construction today and b) will allow us to build a globally interoperable system tomorrow. > Yes, there's a legacy problem, you can say -- if I can have a PUBLIC > identifier in an XML file that is ignored by all standard conforming > applications, then I can have my own private non-XML system using > them and that gives me a nice feeling. (is that a fair summary?) I would put it stronger than "gives me a nice feeling." I would put it: "SGML based systems already know how to use them, and use them to good effect and some people are building large, useful, important systems based upon them. I suspect that they could also be well used by new systems such as HTML editors and website maintainers." And the system only becomes "non-XML" if we make them illegal in XML. > But when XML 2 comes along (say) and PUBLIC identifiers are now > required to be sock weaving patterns, you're still hosed. How could this happen? The semantics of public identifiers are well understood, whether the resolution mechanism is or not. Was there any danger of the SOCAT spec. declaring that all public identifiers must be sock weaving patterns? Anyhow, there is always the danger that a future spec. will tromp on your namespace unless the specifiers are required to avoid that by namespace segmentation or something. > And if > someone else's implementation tries to look up PUBLIC IDs before > SYSTEM IDs, and produces an error message on failure, your files are > not interoperable. By not specifying how PUBLIC IDs work, that's > the sort of problem we'll have. If the language of the XML spec makes the system identifier primary, this will not be a problem. I think that it is clear that if we do not specify a mechanism for public identifier resolution and catalog resolution, the system identifier should be primary. >Perhaps you should give us some clear, > concrete examples of how PUBLIC helps interchange XML documents > over the web, or how there is no circumstance, now or future, in > which it can hinder such interchange, and what other gain is made. There are several features in XML that could hinder interchange, including the fact that XML is a meta-language, but each must be weighed against the potential benefit. The act that XML is a meta-language will certainly impede interoperability (think of indexing metadata, or conversion to another file format), but the power offered by that feature is larger. I believe the same would be true of PUBLIC identifiers in the event that we cannot agree on a default handling mechanism for them. Paul Prescod
Received on Sunday, 9 February 1997 09:46:23 UTC