- From: <lee@sq.com>
- Date: Mon, 31 Mar 97 01:56:07 EST
- To: w3c-sgml-wg@w3.org
I've heard the following sentiment several times: > Let this be an area for market differentation. As long as the processor > faithfully tries to follow the instructions from the author, in the > remote catalog, its mechanisms for finding "better" instructions should > be unconstrained. It could be a remote catalog, a URN resolution service > or annoying popup dialog box. Let the market decide. I can't imagine many people choosing which XML product to buy based chiefly on the way it resolves PUBLIC identifiers. We don't often get asked how Panorama does this when people are choosing between Panorama Pro and DynaText, that's for sure. People work out which is cheaper, or which will work in their own environment, or which can be customised more easily to interoperate with some particular document management system. No, Paul and others, the market won't decide PUBLIC for us. Author/Editor and Arbortext's Adept Editor may have similar initials, but they use incompatible PUBLIC resolution methods, and although we'd like very much to move to CATALOG soon, we haven't seen sales suffer because of that particular issue. If we had, we'd have changed Author/Editor long ago. No, thta's not what incompatibilities will do. What they will do is frustrate users. Once people have bought Author/Editor and RulesBuilder, then their fun begins, and the sound of corks popping is heard througout the land, on the grounds than three days of partying is more enjoyable then understanding how rb.map and extid.map work. But by then they are for the most part blaming the complexities of SGML, not us -- and they are looking forward to seeing it fixed with XML. But if CATLAOG isn't required, an XML A/E would continue to use extid.map, I expect. Why change when you're having so much fun?? The reasoning has to be based on what will maximise interoperability. Our market is not going to prefer one mutually incompatible browser or editor or whatever over another -- it is going to prefer HTML or PDF, where these hassles go away. If we make the mistake of allowing PUBLIC, we have at least to _try_ and ensure that every XML processor can handle every XML file on the web without human intervention. That includes no intervention by system administration. In non-web environments, different amounts of intervention may be acceptable. But let's _try_ to make it work. Paul's CATALOG proposal as redrafted & posted by Michael is a good step. If it is accepted as a minumum requirement for all XML processors, even DTD-less ones, we've probably lost our Dirty Perl Hacker. If it is optional, we have an optional language feature. People hoping to put URNs in PUBLIC identifiers will have to check that it's OK not to have ! @ # % ^ & _ { } [ ] | \ ~ ` ; < > , in URNs, as they are forbidden in PUBLIC identifiers. Perhaps SGML could be changed here, as there doesn't seem any advantage to restricting the character set, and it's going to look odd to allow Kanji or Devanagari or accented Latin characters in SYSTEM IDs such as file names and URLs (URL internationalisation is in progress, but file: URLs are already OK in practice at least, and you can escape characters in URLs with %, a character not allowed in a PUBLIC Id) and have A-Za-z 0-9 and a little punctuation in PUBLIC identifiers, that are supposed to be more powerful. Probably we'll need to reserve yet another character for escaping, since you can't (I think) use ϧ in a PUBLIC identifier. Well, even if you can, the resulting data character has to be legal there. So we could say that ?dddd? would be the escape for an arbitrary Unicode character. So + for space in URLs, %dd for other characters in URLs, &#dddd; in text, and ?dddd? only in PUBLIC identifiers. Still want it? Here are five ways of including a DTD fragment: [1] <!DOCTYPE xx % PUBLIC "yy"> [2] <!DOCTYPE xx % SYSTEM "how to get yy"> [3] <!DOCTYPE xx [ <!Entity yy % PUBLIC "yy"> %yy; ]> [4] <!DOCTYPE xx [ <!Entity yy % SYSTEM "how to get yy"> %yy; ]> [5] <!DOCTYPE xx [ <!Entity catalog % SYSTEM "how to get catalog.xml"> <!--* catalog.xml defines the yy entity *--> %yy; ]> [6] <!DOCTYPE xx [ <!Entity catalog % PUBLIC "catalog.xml"> <!--* catalog.xml defines the yy entity * but this relies on external PUBLIC resolution to * get our real XML catalog *--> %yy; ]> Do we have enough features yet? Why isn't the catalog file itself in XML? We now have three languages: [1] XML, in the body of a document [2] The baroque SGML DTD syntax [3] the CATALOG file Well, with DSSSL another one is coming, I expect. It's all just too much, especially when the gains of using PUBLIC seem so small. With [5] above you get almost all the benefits anyway, with a lower implementation cost, one fewer langauge, and heck, if we removed SYSTEM xxx from DOCTYPE, a smaller language! The missing benefit is that it's harder to do distributed resource mirroring without standardised names, as per Michael's TEI example. But we haven't solved that problem, ans if the URN group solves it, you can put URNs in your SYSTEM identifiers, and you _stil_ don't need PUBLIC. Lee
Received on Monday, 31 March 1997 01:56:09 UTC