Java-based XML browser and implementation comments from Peter Murray-Rust on 1997-02-19 (w3c-sgml-wg@w3.org from February 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Wed, 19 Feb 1997 20:53:22 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <3617@ursus.demon.co.uk>
I have just completed a preliminary distribution kit for my XML/CML
browser JUMBO (Java Universal Markup Browser for Objects) and you are 
welcome to have a play.  The points relevant to the WG are:
	- it is completely portable and installation-free (thanks to 
		Java/Netscape).  All you need is Netscape 3 or MSIE
		equivalent.  You don't even need to know that Java 
		exists, just how to click.
	- it makes use of 'links' in two examples.  These are:
		(a) 1->many unidirectional link (the endpoint is a
		set of tokens in an Element (actually atoms in a molecule))
		(b) a link that points to endpoints in two different elements
		(Obviously I am waiting with great enthusiasm to see the XML
		way of doing it properly and will start recoding as soon
		as some white smoke emerges).
	- it has a prototype of an element-based graphical editor (i.e. 
		one element can be dragged to be a child or sibling of
		a non-ancestor.  Or an element can be dragged to the trash.
		Similarly elements can be dragged to the clipboard and 
		legacy files imported to the clipboard.  There is a crude
		check on content validation, but I would hope to bolt
		NXP in to do the validation.
	- there are some examples of XML.
	- there are legacy files converted to XML-in-memory which can
		then be written out as XML.

I would be very grateful for feedback and bug reporting.  (I don't need
to know that my skill as a GUI builder could be improved - I didn't intend
to develop one :-) and it just happened.).

The system takes about 30 seconds to install after download from:

http:://www.venus.co.uk/omf/cml/moltest.tar.gz (please don't announce this)

You then have to gunzip and untar - Winzip will do this for you.  Then
you have to click on index.html.  (BTW you don't need to know any
chemistry since I have also included scene1.sgm from Julius Caesar :-)
(The linking demos are functgroup.html and vamp.html).

[Note.  I have run this on Netscape 3 on SUN, SGI, Mac, W95 and it works
surprisingly well.  (It was this lack of portaibility that meant I had to
abandon costwish.)  However there still are a number of problems on 
some platforms.  The window handling is especially variable
as are the speeds - you may be surprised which machine I rank bottom. So
it may happen that you get crashes or poor performance - some of this
may be due to client-side settings which are beyond me.  the upside, I hope
is that 5000 copies will be distributed shortly.]

Installation experiences
------------------------

These XML files are WF (and *were* valid against cml.dtd and sgmls).  I have 
not yet managed to validate them against NXP as there is a lot of editing 
to do.  The CML distribution included:
	- CML.DTD which 'includes' TecML.DTD which 'includes' HTML20.DTD
		'inclusion' is through parameter entities.
	- an SGML.DECL (at least that can be junked :-)
	- a Catalog using PUBLIC entities for:
		= the DTDs
		= various EntitySets (e.g. IsoLatin1).
	- about 10 files of attribute values and content models which are
		'included' (this is so I can edit these files rather than 
		the DTD.)  Why do it this way?  Because I need their 
		contents for the postprocessor.

I have only been converting for a few hours, and I'm still struggling with
bits of the spec, using NXP as an arbiter.  The things I have discovered are:
	- all the comments need changing.  This cannot always be done
		automatically.
	- the conditional sections require no white space round the 
		PEReference (trivial now that I know, but it took a 
		while to spot it).
	- the tag model (the '- -' or '- O' - whatever it's called) needs 
		excising.
	- the loss of PUBLIC is a total pain.  When I set up the stuff about
		12-18 months ago, I decided to use the catalog because it 
		was the 'proper way', and I came to believe in this.  (My
		own view about PUBLIC is very simple and naive - it tells
		you *what document to use* (not where to get it).  I think
		this is important.)  So all the references to Entity sets, etc
		are through PUBLIC.  They all have to be changed.
	- the CATALOG needs changing (no ENTITY allowed, no '--')
	- the entity sets need changing (no '--', no SDATA).

The main point is that I thought that XML was a subset of SGML and 5 
minutes with vi would fix the files.  Not so.  I think people will not
be very happy to have to have two version of files for entity sets.  I 
suspect there will be a real need for tools that 'convert SGML to XML'.
Because I need my DTD construction to be flexible, it's hairy and I
would prefer to keep it in 'SGML', converting to XML when necessary or 
on the fly.  Is there a tool which 'normalises' DTDs (i.e. resolves 
all the parameter entities) and outputs a 'processed' DTD?  (While I'm
on that, is there a tool which can parse the DTD so it can be used by a 
program?).

A general conclusion is that you will need to give a good deal of help
to people who want to convert from SGML.  There are bits which aren't obvious
and bits which you might expect to be there and aren't.  This conversion is
not just the document instances, it's the whole lot.  (This isn't meant
to sound negative... but it will be even harder for PhaseII).

NB. I am not sure where the decision process is in Phase II, but I am right in
assuming that it will contain most of Appendix A, and that the syntax will 
be essentially the same??  I particularly don't want to lose anything
in A.1.1.2.  If so, I can start hacking some primitive internal search tool.
It may be premature, but are *groves* almost certainly a part of XML-phaseII?
(4.5) or is it simply allowed to build a query tool based on Appendix A?
If not, someone is going to need to give me a tutorial as 10179 is not 
a Webhacker-friendly document.


	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Wednesday, 19 February 1997 16:48:51 UTC