Re: SD4 - Schema Format [fmt] from Rick Jelliffe on 1997-05-17 (w3c-sgml-wg@w3.org from May 1997)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Sat, 17 May 1997 19:08:23 +1000
To: <w3c-sgml-wg@w3.org>, "James Clark" <jjc@jclark.com>
Message-Id: <199705170908.TAA28722@jawa.chilli.net.au>
> From: James Clark <jjc@jclark.com>

> > Proposal: All machine-readable schemata, whatever their other
> > characteristics, are structured data, and so XML itself is a good
> > carrier syntax for schema expression. We should design a general
> > structure for writing schemata in XML.
 
> This was discussed *very* fully when we decided to stick with SGML's syntax
> for DTDs.  Calling it a schema rather than a DTD doesn't change the basic
> issue.  Reaching agreement on an instance syntax would require a lot of
> time, which I don't think we can afford at the moment, given the enormous
> amount we have still to do.

Two comments:

1) We have very clear goals and a timetable for XML 1.0.    When the draft
causes responses, like some of these (& I am certainly not saying they are
not good ideas), that conflict with our goals and timetable, we should
push ahead to finish XML 1.0, though certainly with the addition of namespaces.

(By the way, I am interested to know in the Microsoft namespace model how they intend that
various content models from various DTDs intertwine and still retain validity.
Does everything have an (implicit) declared content type of #ANY in their scheme?) 

You cannot have a single markup syntax that is optimal for every purpose.  
So maybe XML 1.0 (1997) is optimised for perl hackers and XML 2.0 (1998) 
will be optimised for database transfers.  But people who need XML 2.0 can 
use XML 1.0 in the interim, with only slight speed penalty. 

In other words, I think XML is simple enough that it can widely used in industry,
but that wide deployment will be an enormous generator of new requirements and
goals.  This is why we maybe 

* need to be disciplined to get at least XML 1.0 finished,
* need to keep SGML compatibility, because otherwise XML will fragment into a million
incompatible pieces: SGML provides at least a base by which we can say 
"this wheel has been already invented" or "if you need this extra feature, 
bump yourself up to some XML-like form of simple SGML" 
* make sure that XML doesn't grow into SGML, but keeps its identity as a small
language (using the 20 page rule)


2) There is an idea in the background that you cannot use DTDs to specify 
enough usful information about elements types.  I hope HTML people realise that
in SGML (& XML?) you can add fixed attributes to element types that reference:

* PI  entities,  that allow you to pass any instructions that will help
process the  element.
* NOTATIONs, that allow you to specify the format of the contents of the
element.

So there is no need to extend or replace DTDs with any new tags to
tell you what is in a element, or how it should be processed.  

So the infrastructure is already there. We are just missing agreed on
notations and PIs.

For example:
<!NOTATION comma-delimited-data
	PUBLIC "IDN//W3.ORG//NOTATION XML comma delimited data//EN">
<!NOTATION pipe-delimited-data
	PUBLIC "IDN//W3.ORG//NOTATION XML pipe delimited data//EN">
<!ELEMENT database - - (#PCDATA)>
<!ATTLIST database
	xml-notation  
		NOTATION  (comma-delimited-data | pipe-delimited-data)  
			#IMPLIED>

lets you have documents like:

<?XML version="1.0" ?>
<DATABASE xml-notation="comma-delimited-data">
dog,rover,stinks
cat,happy,scratches
</DATABASE>

In other words, you can, with nice simple XML as it now stands, embed your
own particular highly efficient data. XML becomes more like a wrapper. You 
get the best of both worlds, maybe: you get a standard syntax for metadata,
and as efficient (and proprietary) as you want for the contents.

SGMLs main strength (and HTML) is that it lets you embed other notations:
you can use it for what it is good for. I don't think we should expect XML to
be otherwise.

Rick Jelliffe
Received on Saturday, 17 May 1997 05:07:59 UTC