TITLE: | Assigning Name Spaces to Element Sets and DTDs |
SOURCE: | Martin Bryan, The SGML Centre |
PROJECT: | JTC1.18.15.1 |
PROJECT EDITOR: | Charles F. Goldfarb |
STATUS: | Suggested Extension for 2nd Edition of ISO 8879 |
ACTION: | For information |
SUMMARY OF MAJOR POINTS: | This paper suggests how DTD fragments referenced using parameter entities that refer to element sets identified using formal public identifiers could be assigned their own name spaces that would not conflict with names associated with other elements declared in the same document type definition. |
DATE: | 21st May 1997 |
DISTRIBUTION: | WG8 and Liaisons |
REFER TO: | ISO 8879 |
REPLY TO: | Dr. James D. Mason
(ISO/IEC JTC1/SC18/WG8 Convenor) Oak Ridge National Laboratory Information Management Services Bldg. 2506, M.S. 6302, P.O. Box 2008 Oak Ridge, TN 37831-6302 U.S.A. Telephone: +1 423 574-6973 Facsimile: +1 423 574-6983 Network: masonjd@ornl.gov http://www.ornl.gov/sgml/wg8/wg8home.htm ftp://ftp.ornl.gov/pub/sgml/wg8/ |
More and more SGML document type definitions (DTDs) are being generated by bringing together components from different sources. In particular specialist data types such as mathematics, chemical formulae and industry-specific constructs are typically dealt with by importing the relevant definitions from industry-agreed sources.
When a DTD is constructed from a number of sources there is a danger of the same element or entity name being used more than once. Whilst SGML's subdocument facility was designed to allow the inclusion of data conforming to different models while preventing name clashes, the restrictions that this placed on cross-referencing between document components have made this solution unacceptable to users. Therefore a more flexible means of avoiding name space clashes while at the same time allowing cross-referencing between name spaces within the same document is required.
The basic unit of inclusion of sub-structures into a DTD is a parameter entity call to an external entity which has been defined, using a formal public identifier public text statement, to be a valid element set or a DTD in its own right. What is proposed is a simple extension for such external parameter entities to allow them to set up a separate name space for all elements and entities declared within the element set.
Consider, for example, the following simplified DTD:
<!DOCTYPE X [ <!ENTITY % maths PUBLIC "ISO 12083:1997//ELEMENTS ISO 12083 Mathematical Formulae//EN" NAMESPACE-ID="12083-maths" ROOT="MATHS"> <!ENTITY % tables PUBLIC "-//USA-DOD//ELEMENTS SUP MIL-M-28001 CALS Table Model//EN" NAMESPACE-ID="CALS-table" ROOT="TABLE"> <!ENTITY % table-text PUBLIC "-//USA-DOD//ELEMENTS SUP MIL-M-28001 CALS Table Contents" NAMESPACE-ID="CALS-table" ROOT="(CALS-table)ENTRY"> <!ENTITY % HTML-body PUBLIC "-//IETF//ELEMENTS HTML Version 3.2 Body Elements//EN" NAMESPACE-ID="HTML" ROOT="BODY"> <!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN"> %ISOlat1; <!ELEMENT X (heading, p+)> <!ELEMENT heading (#PCDATA)> <!ELEMENT p (#PCDATA) +(xref)> <!ELEMENT xref (#PCDATA)> <!ATTLIST xref to IDREF #REQUIRED> %maths; %tables; %table-text; %HTML-body; ]>
Two specialist properties have been added to the entity declarations whose public text class is ELEMENTS
:
NAMESPACE-ID
indicates the qualifier by which names associated with elements that form part of this element set will differentiated from other references to namesROOT
indicates the name of the element in the current element set, or in a parent element set, which activates the setting up of a new name space.In the example shown above, the name space known as 12083-maths
will be activated whenever an element with the name MATHS
is encountered. Similarly an element called BODY
must be entered to activate the HTML-body
name space.
For CALS tables the situation is more complicated. Two parameter entities have been used to incorporate the relevant element declarations. The first defines the elements that are specific to tables. The secon! d id entifies textual elements that can occur within tables as well as elsewhere in a CALS file. All elements within a CALS table share the same name space. The points at which the two element sets are triggered, however, can be considered to differ.
Whilst outermost TABLE
element can be used to trigger the name-space for the elements in the %CALS-table
element set, elements defined within the %table-text
entity can only be part of the CALS-table
name space when they occur within an entry
element that is part of a CALS table. This is indicated by qualifying the name of the root element with the name space id of the entity in which the relevant element is contained.
Given that CALS table entries can have paragraphs called <p>
, HTML text can have paragraphs called <p>
and this DTD
has its own definition of <p>
, we can now differentiate them by reference to the elements they are contained within. For example, the following document instance conforming to the above DTD could be envisaged:
<x><heading>Nesting DTDs</heading> <p>This is an unnested paragraph. It has an unnamed name space (or could be considered part of the X namespace).</p> <body><p>This text is within the body text defined in the <code>HTML-body</code> parameter entity. All elements within the <code><body></code> tags belong to the <code>HTML</code> namespace. </p> <p>Note that paragraphs in this namespace contain embedded elements, whose definition comes from the <code>HTML-body</code> element set.</p></body> <table id="table1">...... <entry><p>Paragraphs in CALS Tables</p></entry> <entry><p>You can embedded any CALS specific text elements within table paragraphs, provided they are defined in the table-text parameter entity set.</p> <p>Note that embedded table-text elements share the same name space as those defined in the element set defined in the tables entity.</p> </table> <p>This last paragraph, like the first one, belongs to the unnamed name space associated with DOCTYPE X. It contains a reference to <xref refid="table1">Table 1</(X)xref></p></x>
This SGML document instance could also have been coded as:
<(X)x><(X)heading>Nesting DTDs</(X)heading> <(X)p>This is an unnested paragraph. It has an unnamed name space (or could be considered part of the X namespace).</(X)p> <(HTML)body><(HTML)p>This text is within the body text defined in the <(HTML)code>HTML-body</(HTML)code> parameter entity. All elements within the <(HTML)code><body></(HTML)code> tags belong to the <(HTML)code>HTML</(HTML)code> namespace. </(HTML)p> <(HTML)p>Note that paragraphs in this namespace contain embedded elements, whose definition comes from the <code>HTML-body</code> element set.</(HTML)p></(HTML)body> <(CALS-table)table id="table1">...... <(CALS-table)entry><(CALS-table)p>Paragraphs in CALS Tables</(CALS-table)p></(CALS-table)entry> <(CALS-table)entry><(CALS-table)p>You can embedded any CALS specific text elements within table paragraphs, provided they are defined in the table-text parameter entity set.</(CALS-table)p> <(CALS-table)p>Note that embedded table-text elements share the same name space as those defined in the element set defined in the tables entity.</(CALS-table)p> </(CALS-table)table> <(X)p>This last paragraph, like the first one, belongs to the unnamed name space associated with DOCTYPE X. It contains a reference to <(X)xref to="CALS-table+table1>Table 1</(X)xref></(X)p></(X)x>
Note that it is the NAMESPACE-ID name that acts as the document type name for the qualified elements, rather than the name of the root element used to identify when a particular name space has been activated.
Also note that the NAMESPACE-ID is also used to qualify the name of the unique identifier pointed to by the cross-reference element, xref
. This is because, unlike the case of references to IDs in other documents, you cannot reference the name of the entity used to uniquely name the (sub)document for internal cross-references.
This latter point has an interesting consequence. There must be no clash between NAMESPACE-IDs and the names assigned to document types or (sub)documents within the same document set.
Note: How these constructs could be incorporated into the formal produ! ctions of SGML is unclear. At present there seems to be no clear way to restrict keywords of the type proposed to entity declarations declared using public identifiers. Then there is a problem with how they can be restricted to use with public identifiers whose public text keywords are ELEMENTS and DTDs. There is also a question of whether similar processes could be useful with public text keywords are ENTITIES or NOTATIONS. It should be pointed out, however, that the above solution is probably the simplest of those proposed to date.