- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Wed, 02 Apr 1997 10:17:21 GMT
- To: w3c-sgml-wg@w3.org
(in response to: April Fools RANT about Catalogs) In message <3341DDE1.73C4@csclub.uwaterloo.ca> Paul Prescod writes: > Generic markup is also the source of SGML's power -- the power to define > your own, perhaps non-interoperable documents. XML will not change this. > I will not be able to download one of Peter M-R's chemical models and ^^^^^^^^^^^^^^^^^^ [See below] > spin around molecule models in an arbitrary browser (unless he delivers > his code as a Java applet). XML gives him the power to define something > that is mostly non-interoperable with my browser, because that is what > he needs to do to get his job done. When I define a 3D scene in XML [It is automatically interoperable if certain directions are taken] <ISSUES> I have tried to keep quiet about the PUBLIC debate, partly because I am not knowledgeable about the relevant RFCs and use of catalog, but since it raises serious implementation issues I think it's important. [I'm also slightly worried that it's diverting attention from some other issues. I know we are all working flat-out for WWW6, but there are some issues which I'd like guidance on before then :-) (which *are* in the drafts).] Since I'm mentioned in the quoted message, I'll take the opportunity to reply. The PUBLIC discussion seems to have two main threads. One is TTLG, Chapter 8: "the name of the DTD is called", the "DTD is called", etc. This is important in its own right, just as the distinction between Element, ElementType and GI, but it's not my main concern at present. The other is implementation. The hidden problem is that some people see XML as an opportunity for a a. black-box installation b. totally-reliable transfer of information c. application-independent (i.e. can do molecules, Beethoven, etc.) d. site-independent e. platform-independent f. training-free g. inter- and intra-net compatible h. totally automatic operation (i.e. human-free). i. simple method of distributing complex hyper-resources. It's clearly not at present. Some of the check-boxes above have been demonstrated in some applications, but not all at once. If we aim for all of these, we are unrealistic and so some of the boxes have to go. SGML has been built (as far as I can gather) on (b) being at the top of the list with (h) close behind. (i) doesn't figure. HTML has been built on (i), (a), (e), (d), (f), (g) in some order. (h) and (b) have little priority. XML has yet to work out which its priorities are at present, though in principle it can offer many of these in a year's time or so (but probably not all). The PUBLIC debate (which is only one of several areas in XML where conflict can arise) is at least in part due to the conflict between (h) and other factors. My priorities for CML are (b) - the whole whole point is that the information is precisely captured, described and maintained. I don't *have* to use XML but it's ideal in its present state. [I started CML, before XML was announced, as an SGML-based approach to chemical and technical information.] Anything else is a bonus. However I have been seduced by XML and Java into thinking that I can manage also (a) (c) (d) (g) (e) and a bit of (i). I shall also (I hope) produce some simple material so that people will get sufficiently enthusiastic that they will put in the effort to overcome the others. </ISSUES> <IMPLEMENTATION> From past experience I have come to realise that *I* cannot design a language without implementing it at the same time. It's easy to add seemingly simple things that have major consequences. That's one reason why I ask dumb questions some time after they have been discussed - because I'm trying to code them. [I've only just got as far as starting to implement XML-LINK, for example, and asked some simple questions on xml-dev. Clarification still awaited :-)]. In the past most of my efforts have failed on (a). Indeed it's only in 1997 that we have any chance of addressing this, which for my purposes is the major concern about PUBLIC and SYSTEM. Essentially what some of the WG are aiming at is to * automagically deliver a complete working infallible maintenance-free XML system to a user without the user even being aware that XML/DSSSL/Java/lots_of_other_things even exist. On good days I share in most of this vision :-) I'd also like it to be shared and developed by the community. <CASESTUDY> IMO Mosaic and httpd were what made the WWW take off because they were: - platform-independent - simple - installation-free (fairly) - free In the first incarnations they (particularly Mosaic) were NOT robust and this paralleled the fact that the hyperdocuments weren't particularly robust either. I believe that appropriate tools give XML the same opportunity in 1997 as Mosaic/httpd did for HTML in 1993 - if we get the distribution right. </CASESTUDY> <CASESTUDY> My first experience of trying to distribute SGML was with CosT (Joe English's) version. To deliver what I wanted I had to deliver - sgmls - my (medium-complex) DTDs - some entity files - CoST (in tcl/tk) - tcl/tk - a browser (also in tcl/tk) - molecular add-ons - and some documents :-) to users who had never heard of any of these things. Not surprisingly it didn't fly - the actual thing that finally stopped me was the difficulty of porting costwish to tcl/tk under Windows (let alone the Mac). </CASESTUDY> <CASESTUDY> Last year I was sent a free copy of PanoramaPro (thanks SoftQuad) and was impressed by the way it delivered documents over the WWW. When I pointed it at a URL it whirred and clicked like a Heath Robinson machine loading entities, DTDs, styleSheets as well as the document. I suspect that an SGML-illiterate could have done this as well as me, though they wouldn't have understood what was happening. The whole set of files delivered over the WWW essentially represent a single hyperdocument, and internal self-consistency is critical. However PP also has its own local files (entities, DTDs, etc.) and the user can modify what is in these directories, etc. In that way it seems possible that the user can foul up the self-consistency quite easily by replacing (say) one entity set with a different one under the same filename. The robustness is predicated on the user not fingerpoking in the wrong places. </CASESTUDY> <CASESTUDY> JUMBO. JUMBO aims to solve some of the problems that costwish couldn't. Being in Java it is: - platform-independent - installation-simple (trivial where Netsplorer provides a JVM) - training-free (relatively, since the technology is widepsread) - inter- intra- net friendly It is not yet robust. The key question is: ** how do I package/deliver all the components of the installation so that they are installation-trivial, robust, and self-consistent? ** This seems to be at the root of some of the PUBLIC debate - are either SYSTEM or PUBLIC robust enough to ensure that the correct document set is used? I have spent the w/e rewriting JUMBO so that it uses URLs throughout. This may seem trivial to many of you, but having started [JUMBO] off as an application (i.e. not an applet) which uses files, it was a revelation to me. URLs maintain consistency of addressing for any JUMBO document whether XML, DTD, or class. It allows (in principle) any of these to be downloaded over the WWW. Java is particularly supportive of the consistency of a set of documents, when used in a restrictive environment (e.g. a JVM in a browser) since only one site can be visited. It's fairly easy to make sure that only the 'right' set are accessed. As an example, the current distribution for JUMBO includes: - dtd.classes (a list of the DTDs in the distribution and their Java classes) - the classes for each DTD - (if required) the DTDs and their entity sets. Since the whole lot can be downloaded from a server, the consistency ought to be manageable. So - in answer to Paul's query - any browser will be able to manage arbitrary DTDs so long as the classes can be located. For example, when JUMBO detects a DOCTYPE of PLAY, it looks through its local dtd.classes, finds that it has some classes which understand Shakespeare rather than molecules and dynamically loads these. [I hope this can be shown at WWW6]. IFF the browser encounters a DOCTYPE of FOO, then so long as it can locate FOO.class it can render/transform/whatver. If I hadn't got PLAY.java locally, then it could be (potentially) downloadable from the same site as the *.sgm. </CASESTUDY> </IMPLEMENTATION> <SUMMARY> We need to be able to locate the ancilliary documents consistently and robustly. Personally I don't mind whether this is done through SYSTEM or PUBLIC or both, but we have to have a clearly defined mechanism for the various types of environment that it will be done in. If we know that documents are ONLY going to be delivered into a JVM so that relative addresses cannot break then SYSTEM would seem to work. If we are expecting users to configure their resources (e.g. to minimise bandwidth usage), and if we expect them to do some fingerpoking, then relative addresses will break. PUBLIC would detect the break, even if it couldn't mend it. If we don't mind URLs decaying, then the integrity of the information will be maintained although the system may not work. I can live with that, others may not be able to :-). </SUMMARY> P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Wednesday, 2 April 1997 09:05:47 UTC