Seybold on XML

[Courtesy of a correspondent.]


Volume 2, No. 7
November 20, 1996

On the tenth anniversary of the adoption of SGML as an ISO standard, a band
of SGML experts announced they have drafted a simplified subset of the
language they hope will spur the use of SGML on the Internet. The new
language, Extensible Markup Language, or XML, was prepared by a World Wide
Web Consortium working group consisting of about 80 members, primarily
representing vendors. The announcement was made at SGML '96, being held in
Boston this week. The first published draft is available on the Web at

XML, like SGML, is a meta-language for describing the markup of different
types of documents. It is simpler than SGML, reducing a 500-page reference
to 26 pages. 

Unlike HTML, which has a fixed (albeit changing) set of tags, XML lets you
define your own tags and attributes. Support for XML by the Internet
community would open up vast new possibilities for Internet publishing.
Instead of shoehorning all documents into HTML, or having to invent a
browser to handle non-HTML documents, XML would enable a wide array of
user-defined documents to be handled by generic Web application software.

Users of SGML can easily make use of XML. XML is a valid subset of SGML, so
translation from SGML to XML is straightforward. 

To simplify SGML, the W3C working group dropped support for certain features
that required heavy processing on SGML client software. For example, a
well-formed XML document is unambiguous, so that a browser or editor can
read the tags and create a tree of the hierarchical structure without having
to read a document type definition. XML also does not allow markup
minimization, require that empty elements be self-identifying or support
several of the complex optional features of SGML.

At least two vendors were demonstrating XML support at SGML '96. Neither
Microsoft nor Netscape has disclosed if it will support the standard. 

XML uses 8-bit ASCII and Unicode as its primary character sets. Having
rocked the SGML community with the most radical SGML development in a
decade, the W3C working group plans to continue with two more phases of XML.
According to Jon Bosak, chair of the W3C SGML Editorial Review Board, the
next phase will add more complex hyperlinking, and the third phase will
address style sheets, using either an improved version of CSS or an online
version of DSSSL.