- From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
- Date: Sun, 4 May 1997 10:56:45 -0700
- To: w3c-sgml-wg@w3.org
Here are a couple of articles that ought to wash out the bad taste of
that clueless Dvorak piece.
Jon
========================================================================
XML will take the Web to the next level.
PC Week, April 28, 1997 v14 n17 p46(1)
Author
Sullivan, Eamonn
Summary
The new Extensible Markup Language (XML) attempts to overcome the
limitations of HTML by providing the flexibility needed to deploy more
sophisticated documents and exchange very complex data over the Web.
XML is a simplified version of Standard Generalized Markup Language
(SGML), a 'metalanguage' that can describe other markup languages such
as HTML. SGML could theoretically be used for Web browsers because its
Document Type Descriptions (DTD) make it very extensible, but its
complexity would add too much overhead to everyday Web software and
DTDs are difficult to write. XML is a compromise between HTML's
simplicity and SGML's power. It creates simplified DTDs and lets
authors create new tags, some of them very complex, at will. The new
language is fully SGML-compatible and can be read by advanced
document-management software, although it cannot read all SGML
documents. XML adds bidirectional hypertext and other features
currently missing from the Web.
Full Text
Labs explores enabling technologies of next-generation markup language
Many companies have jumped wholeheartedly into the Web, only to find
that deploying a large Web site is as complex as developing a large
application--and that HTML is not up to the task. It's akin to trying
to develop an operating system in BASIC.
The Extensible Markup Language, or XML, is the World Wide Web
Consortium's answer to the limitations of HTML. It is an extremely
flexible language that will enable organizations to deploy more
sophisticated documents and exchange complex data via the Web.
The XML specification was released at the Sixth International World
Wide Web Conference in Santa Clara, Calif., earlier this month.
Several software vendors, including Microsoft Corp. and Netscape
Communications Corp., have already endorsed it.
What is XML?
XML is a simplified version of SGML (Standard Generalized Markup
Language). To understand what XML is, and what it's good for, it's
necessary to understand SGML.
SGML, an international standard that predates the Web, is actually a
"metalanguage," a language for describing document markup languages.
For example, HTML is a markup language that can be described by SGML.
To use an SGML editor to create Web pages, an author would first have
to supply the editor with a description of HTML. That description
(written in SGML) is called a DTD (Document Type Description).
SGML also enables organizations to exchange data. For example, an auto
parts manufacturer could use SGML to create a markup language for its
parts documentation. The language might include tags such as <make>,
<model> and <year>.
The manufacturer could then distribute the DTD to its distributors,
who would use it to create applications that search for the custom
tags and extract the information automatically.
Theoretically, we could be using SGML browsers to surf the Web. This
would give Web authors tremendous flexibility: If HTML didn't have the
features needed for a given set of documents, authors could create
extensions to HTML and attach a DTD to their documents.
But Web browsers are not designed that way, because SGML is simply too
complex. Writing a DTD is difficult, as is writing applications that
can accurately decipher them. In other words, the complexity of fully
implementing SGML outweighs its benefits.
XML to the rescue
XML was designed as a compromise between the simplicity of HTML and
the flexibility of SGML. Like SGML, XML is a metalanguage, but it's
easier to use and creates simpler DTDs.
Using XML, authors can create new tags at will, even very complex
ones. They also can use XML DTDs to validate the structure of large
numbers of documents, which is important when importing the data from
those documents into other applications.
XML is also fully SGML-compatible. Because XML documents are readable
by SGML software, organizations with an investment in SGML can use XML
right away.
However, since XML is a subset of SGML, it can't read all SGML
documents. Ironically, one important SGML language that is not
XML-compatible is HTML. Fortunately, only minor changes are needed to
make an HTML document compatible with XML.
Organizations can use XML to ease the exchange of information between
disparate applications. For example, the Chemical Markup Language is
an XML-compatible markup language with specific extensions for
describing molecules and compounds. Using the DTD for that language, a
developer could create a filter to import data points from a Web page
into a proprietary chemical modeling application.
Developers also will be able to create clients that are more
intelligent. An XML client, for example, could sort the part
manufacturer's data by make, model or year--or show the user only the
portion of the data pertaining to his or her model of car.
In addition, XML will make intelligent agents easier to design and
deploy. Today, agent software has to jump through hoops to recognize
the right data points on constantly changing Web pages. With XML,
relevant data points can be marked with their own tags (such as
<price>, for example), so they're easy to find.
Finally, XML includes hypertext features that are currently missing
from the Web. Such features as bidirectional and location-independent
links and "transclusion" (where a linked document appears as part of
the current page) will be possible using XML.
For more information on XML, visit the W3C's Web site at
www.w3.org/pub/WWW/MarkUp/SGML/Activity.
Copyright © 1997, Information Access Company. All rights reserved.
========================================================================
Designing Web sites for non-human audiences.
PC Week, April 28, 1997 v14 n17 p38(1)
Author
Sullivan, Eamonn
Summary
Web pages can be used not only as a direct end-user interface but to
link one application with another. Future Web sites will be browsed by
intelligent software agents, which provide automatic information
retrieval, as much or more as by human beings. Such electronic
conduits are sensible when there is a lot of information to retrieve
or it changes frequently because Web pages can be generated on the fly
and impose few compatibility issues. The inherent limitations of HTML,
which can only represent certain types of data, are problematic, and
overcoming the fact that HTML focuses almost exclusively on visual
information is the focus of numerous development efforts. There are
already several products that bring sophisticated parsing engines to
the Web and can find and automatically recognize data in fast-changing
pages. The upcoming Extended Markup Language (XML) standard lets
content providers make their intentions far more explicit.
Full Text
In the last couple of years, web site developers have focused,
rightly, on identifying and serving their audience. But that task is
going to get more difficult in the next year or so as the definition
of an "audience" gets stretched out of recognition.
Some of your most important readers, for example, won't even be human.
They'll be automated agents or simpler programs designed to import
information from Web pages into another application. Even now, some
Web sites are serving pages in which the needs of a human reader are
secondary, at best: large tables or lists that are read almost
exclusively by an application running at a customer or partner site.
There are now several applications that automate information retrieval
from Web pages, including WebMethods Inc.'s Web Automation Toolkit
(www.webmethods.com), AgentSoft Ltd.'s LiveAgent (www.agentsoft.com)
and OnDisplay Inc.'s CenterStage (www.ondisplay.com). On the provider
side, one of the main motivations behind the development of XML was to
make Web pages easier for programs to verify and parse.
Using Web pages as the conduit between applications makes a lot of
sense, especially in an extranet between you and your customers and
partners. For the small stuff, using FTP to transfer spreadsheets in
some readily importable format is good enough, but that method becomes
quickly impractical when the data becomes voluminous or if it changes
frequently. Web pages, in contrast, can be generated on the fly,
secured using any of several methods, and impose very few application
compatibility requirements.
One problem with using HTML, however, is its limited ability to
represent data. HTML is an implementation of SGML, and was designed to
communicate a document's structure and semantic content, but its
evolution since then has focused almost exclusively on communicating a
document's visual appearance. If you want to communicate anything
beyond "this is a headline" or "this is a list," you're on your own.
Much effort is focused on overcoming that limitation. Products such as
CenterStage include sophisticated parsing engines, designed to find
and automatically recognize data in frequently changing pages that are
designed for the human eye. The parsing engines also try to recognize
the data type.
XML takes a more logical approach. The information provider, after
all, knows the most about the data. Rather than designing complex
programs that try to figure out what the provider meant, XML lets the
provider make its intention explicit. For example, CenterStage goes to
great lengths to find a price on an HTML page. With XML, the provider
can just create and use a tag called "<price>."
Looked at this way, XML is simply a natural extension of current
practice. Many of you probably mark off critical data with HTML
comments. The comments are invisible to browsers but can be used to
mark critical sections and data so that they can be more efficiently
updated programmatically. What XML adds to this practice is
documentation, using DTDs (Document Type Descriptions) that other
sites can read and use as the basis for automating information
retrieval.
XML also makes it easier to design sites that are useful for human
readers and applications. With explicit tags, you can rearrange your
pages without worrying about breaking programs on the other end. But
whether you end up using XML or just document your HTML comments, a
well-designed, machine-readable Web site is easier to use--and that's
always the ultimate goal.
What do you think the chances are that XML will catch on? Contact me
at esullivan@zd.com.
Copyright © 1997, Information Access Company. All rights reserved.
Received on Sunday, 4 May 1997 13:57:14 UTC