KISS again from Peter Murray-Rust on 1997-06-20 (w3c-sgml-wg@w3.org from June 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Fri, 20 Jun 1997 22:16:19 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <8300@ursus.demon.co.uk>
<WARNING>
I hope that the views in this posting won't upset too many people - they are
meant to re-emphasise the unknown areas that XML will be used in and what it 
needs to cater for.
</WARNING>

I am assuming that XML will be a success.  In that case, most of the people 
using it will be completely new to SGML, its traditions, its terminology and
its syntax.  I suspect that most of the people on the WG - who are battle-
hardened SGML professionals - have forgotten how hard it is to learn SGML.

I don't know whether there are other webhackers on this list (i.e. people who
have never had to hack SGML as part of their raison-d'etre) - if so I think
that their views should be listened to.  If not, then please treat mine as a
sample of 1 from a very large population and scale accordingly.  Having
been called the WG's bellwether (sic) - a ('ringleader, or loud turbulent 
fellow' according to my dictionary) the rest of this presents my view of the
webhacker and what they want (not necessarily what they need) from XML.

<FACT>
It took me three years to learn SGML, its terminology, its syntax (which I
find the most difficult I have come across).
</FACT>

<SUPPOSITION>
90% of the XML-users in a years' time will have had no contact with SGML.  Most
will not be CS graduates - most will be Dirty Perl Hackers and HTML hackers.
</SUPPOSITION>
I base this on the likelihood that they will encounter it through browsers, 
through webzine hype, etc.  When Java came out it was seen as a way of 
making images dance across the screen, rather than the (good) OO language that
it is.  XML will similarly be seen as a roll-your-own tag factory.  The 
real power will be ignored (at least at first).

<FACT>
Most webhackers need quick fixes.  They have little time to go into the 
difficult parts of languages until they have to.
</FACT>

<FACT>
DTDs take a great deal of time to create well. 
</FACT>
I know this is true because every time someone posts on c.t.s the universal
reply is 'you really have to take time to understand your data model, don't
rush into creating a DTD quickly, you have to keep revising it till you've
got it right.'  I don't quarrel with this :-)
<COROLLARY>
Most of the next generation of XML-techies are not going to create DTDs.  They
don't know how, haven't been apprenticed, and don't see the point.  So the 
number of DTDs created will be quite small.  Most of these will come from the
currently SGML-literate.  IMO this represents quite a small proportion of the
XML community - if it doesn't then XML will not be a universal success.
</COROLLARY>

To make XML a success initially it has to be simple.  I hope that whatever 
the ERB decides on the last 3-4 weeks' discussions, they recognise that XML
must have an entry point that entices non-SGML-literates to start playing,
**without help from the SGML community.**

Please also accept that some of the things labelled 'simple' in the XML spec
are not necessarily simple for newcomers.  PE's may now be simple, but their
implementation is not easy to determine from the spec.  If they really are
as simple as DEFINE, then highlight the rules, please :-)

<CONTROVERSY>
Personally I believe that DTD validation will be a fairly small part of what 
people will require from the validation process.  IMO a serious lack in SGML
is any requirement to add any machine or textual semantics to the DTD.  There
is no mechanism for extracting DTD information into processing code, no
WEB-like (in the DEK sense) way of documenting it.  How do I find out what
FOO actually means to a human?  How do you extract the human semantics from
(say) HTML2.0 - it's in a completely separate document.  What most people
will want is intimate linking of tags to semantic information.
</CONTROVERSY>
This isn't to say that DTDs aren't important.  I've written them for CML, but
they have to be so flexible that they have few constraints.  What is far more
important is what semantics attach to each element, and I've put a lot of 
(probably very amateur) work into that with glossaries, Java methods, etc.

<ANECDOTE>
I have just come back from two days about Objects in Bioinformatics - the 
human genome, etc.  Our community are addressing information sharing, and they 
don't use SGML. They currently use FORTRAN-formats.  Now they are looking for 
the next generation - and  the  embryonic universal solution is CORBA and 
JAVA/C++.  This will manage most of what they want to do.
</ANECDOTE>
So I am trying to sell them XML.  Why, and how? 
- it is simpler (unless it gets made too complicated).
- it can be read by humans.  They can touch and feel the data.  They aren't
	frightened of angle brackets any more.  It can be hacked.  It's the 
	obvious replacement for their FORTRAN files.
- it will be in the browsers.  (BUT so ar ORBs (the latest Netscape has an
ORB - Visibroker, I believe.) This means that they will discover this wonderful
thing XML and start hacking.  Not the slightest chance that they will use DTDs.

So IMO this posting represents something moderately close to 90% of the likely
XML community.  If you want XML to be widely used you have to take note of it.
You may not like it, but it's reality. 

	I am really looking forward to July 1.  I will do my best to implement
what emerges. :-)

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Friday, 20 June 1997 18:12:07 UTC