Re: SD4 - Schema format

At 12:15 16/5/97 BST, Henry S. Thompson wrote:

>Although I realise the schema proposal could be seen as reopening the
>'DTD in XML vs. DTD as per SGML' debate which we settled 8 months ago,
>I think there is a lot of merit to doing so ...

At last week's SGML revision meeting a number of techniques for providing
name space control within SGML were discussed. A suggested formula for this
was submitted from Japan some time ago. I proposed another solution, which I
promised I would write up as soon as I got back. While I had intended the
full description to be a task for next week it looks like I better rough out
my idea now to see if it would meet the needs of the XML community as well.

The basic unit of inclusion of structures into a DTD is a parameter entity
call to an external entity. What I have proposed is a simple extension to an
external parameter entity to allow it to set up a separate name space for
any elements declared within it. Consider, for example, the following
simplified DTD:

<!DOCTYPE X [
<!ENTITY % maths PUBLIC "....." NAMESPACE-ID="12083-maths">
<!ENTITY % tables PUBLIC "......" NAMESPACE-ID="CALS-table">
<!ENTIT% %table-text PUBLIC "....." NAMESPACE-ID="CALS-table">
<!ENTITY % HTML-body    PUBLIC "....." NAMESPACE-ID="HTML">
<!ELEMENT X (heading, p+)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT p (#PCDATA)>
%maths;
%tables;
%table-text;
%HTML-body;
]>

Given that CALS table entries can have paragraphs called <p>, HTML text can
have paragraphs called <p> and this DTD has its own definition of <p> we can
simply differentiate them from the elements they are contained within. For
example

<x><heading>Nesting DTDs</heading>
<p>This is an unnested paragraph. It has an unnamed name space (or could be
considered part of the X namespace).</p>
<p>
<body><p>This text is within the body text defined in the
<code>HTML-body</code> parameter entity. All elements within the 
<code>&lt;body&gt;</code> tags belong to the <code>HTML</code> namespace.
</p>
<p>
Note that paragraphs in this namespace contain embedded elements, whose
definition comes from the <code>HTML-body</code> element set.</p></body>
<table>......
<entry><p>Paragraphs in CALS Tables</p></entry>
<entry><p>You can embedded any CALS specific text elements within table 
paragraphs, provided they are defined in the tables-text parameter entity
set.</p>
<p>Note that embedded table-text elements share the same name space as those
defined in the element set defined in the tables entity.</p>
</table>
<p>This last paragraph, like the first one, belongs to the unnamed name space
associated with DOCTYPE X.</p></x>

What we would need to do if this proposal was accepted would be to tidy up
the definitions of the keywords used in public identifiers so that ELEMENT
and NOTATION could contain ATTLISTS and associated comment declarations.
(This is already on the list of things to do for SGML.) We may also need to
think about other types of sets (e.g. PI notation sets separately!)

Would such a scheme go need to meeting the needs expressed by Jean Paoli and
Henry Thompson?


----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.sgml.u-net.com/

Received on Friday, 16 May 1997 11:56:45 UTC