Re: C.4 Undeclared entities? from lee@sq.com on 1996-10-25 (w3c-sgml-wg@w3.org from October 1996)

From: <lee@sq.com>
Date: Fri, 25 Oct 96 11:39:52 EDT
To: bill.smith@eng.sun.com, w3c-sgml-wg@w3.org
Message-Id: <9610251539.AA20646@sqrex.sq.com>
> I suspect I'm alone in this opinion but I take "as simple as possible" quite 
> literally.

No, you're not alone.

> We can make XML a clean, simple, extensible markup language, HTML++ 
> or we can turn it into something else, SGML--.

Well, I for one would vote for the cleam, simple extensible markup language
rather than HTML++ or SGML--, out of the three options you give.
Obviously HTML++ would have to build on the long-defunct HTML+ draft.

I've just been working on a small CGI-based "database" (it uses perl and
grep to do things like an address book).
The internal files are all text files, and use things like
<Schema>
    <Database>
	<name>contacts</name>
	<description>People in the SGML industry</description>
    </Database>
    <Fields>
	<Field>
	    <name>address</name>
	    <type>multiline</type>
	    <search>full</search>
	</Field>
    </Fields>
</schema>

There are no attributes, empty elements or entities, so that "parsing"
in perl was a matter of
    if (/<Database>/ .. /<\/Database/) {
	if (/<name>(.*)<\/name>/) {
	    print "database name: $1\n";
	}
    }

Now, this isn't a major engineering project, and perhaps this isn't
even a valid use of XML, and I should go back to the IAFA Template
or RFC822 format that I would normally use:

Database: {
    Name: contacts
    Description:
	People in the
	SGML industry
}
Fields: {
    Field: {
	Name: address
	type: multiline
	search: full
    }
}

which is just about as easy to parse, allows spaces in field names,
and I'm less likely to be accused of abusing SGML :-)

For me, that which I have just shown is as simple as possible.
It is not internationalised.

    File:	element_sequence
		;

element_sequence: 
		| /* empty */
		;

element		: start_tag contents end_tag
		;

contents	: element
		| text
		| contents
		| /* allowed to contain nothing */
		;

This idea in itself is powerful.  It is also sufficiently powerful to
represent the same set of structures represented by SGML (with the
exception of CDATA marked sections, unless you allow \< to escape a <)

Anything else is syntactic sugar.
    <A HREF="xxx">yyy</A>
can be written as
    <A>
	<HREF>xxx</HREF>
	yyy
    </A>

So the question is, how much sugar do we need?

One step simpler for me is either the Fields: {} structure or a
LISP-like syntax, but neither of those allows robust error-recovery,
and since I consider that essential, what I have shown is for me minimal.

The people who think more is needed probably aren't implementing it,
or they already have 10,000+ lines of code to parse SGML, and don't
mind if the language is the same.  I'd really like to see a world in
which most or all tools interoperated.   I think we had an opportunity
to start that here.  But it won't happen with the XML I'm hearing about.

Lee
Received on Friday, 25 October 1996 11:39:57 UTC