Articles in PC Week from Jon Bosak on 1997-05-04 (w3c-sgml-wg@w3.org from May 1997)

From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
Date: Sun, 4 May 1997 10:56:45 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <199705041756.KAA08854@boethius.eng.sun.com>
Here are a couple of articles that ought to wash out the bad taste of
that clueless Dvorak piece.

Jon

========================================================================

   XML will take the Web to the next level.
   
   PC Week, April 28, 1997 v14 n17 p46(1)
   
   Author
          
   Sullivan, Eamonn
          
   Summary
   
   The new Extensible Markup Language (XML) attempts to overcome the
   limitations of HTML by providing the flexibility needed to deploy more
   sophisticated documents and exchange very complex data over the Web.
   XML is a simplified version of Standard Generalized Markup Language
   (SGML), a 'metalanguage' that can describe other markup languages such
   as HTML. SGML could theoretically be used for Web browsers because its
   Document Type Descriptions (DTD) make it very extensible, but its
   complexity would add too much overhead to everyday Web software and
   DTDs are difficult to write. XML is a compromise between HTML's
   simplicity and SGML's power. It creates simplified DTDs and lets
   authors create new tags, some of them very complex, at will. The new
   language is fully SGML-compatible and can be read by advanced
   document-management software, although it cannot read all SGML
   documents. XML adds bidirectional hypertext and other features
   currently missing from the Web.
   
   Full Text
   
   Labs explores enabling technologies of next-generation markup language
   
   Many companies have jumped wholeheartedly into the Web, only to find
   that deploying a large Web site is as complex as developing a large
   application--and that HTML is not up to the task. It's akin to trying
   to develop an operating system in BASIC.
   
   The Extensible Markup Language, or XML, is the World Wide Web
   Consortium's answer to the limitations of HTML. It is an extremely
   flexible language that will enable organizations to deploy more
   sophisticated documents and exchange complex data via the Web.
   
   The XML specification was released at the Sixth International World
   Wide Web Conference in Santa Clara, Calif., earlier this month.
   Several software vendors, including Microsoft Corp. and Netscape
   Communications Corp., have already endorsed it.
   
   What is XML?
   
   XML is a simplified version of SGML (Standard Generalized Markup
   Language). To understand what XML is, and what it's good for, it's
   necessary to understand SGML.
   
   SGML, an international standard that predates the Web, is actually a
   "metalanguage," a language for describing document markup languages.
   
   For example, HTML is a markup language that can be described by SGML.
   To use an SGML editor to create Web pages, an author would first have
   to supply the editor with a description of HTML. That description
   (written in SGML) is called a DTD (Document Type Description).
   
   SGML also enables organizations to exchange data. For example, an auto
   parts manufacturer could use SGML to create a markup language for its
   parts documentation. The language might include tags such as <make>,
   <model> and <year>.
   
   The manufacturer could then distribute the DTD to its distributors,
   who would use it to create applications that search for the custom
   tags and extract the information automatically.
   
   Theoretically, we could be using SGML browsers to surf the Web. This
   would give Web authors tremendous flexibility: If HTML didn't have the
   features needed for a given set of documents, authors could create
   extensions to HTML and attach a DTD to their documents.
   
   But Web browsers are not designed that way, because SGML is simply too
   complex. Writing a DTD is difficult, as is writing applications that
   can accurately decipher them. In other words, the complexity of fully
   implementing SGML outweighs its benefits.
   
   XML to the rescue
   
   XML was designed as a compromise between the simplicity of HTML and
   the flexibility of SGML. Like SGML, XML is a metalanguage, but it's
   easier to use and creates simpler DTDs.
   
   Using XML, authors can create new tags at will, even very complex
   ones. They also can use XML DTDs to validate the structure of large
   numbers of documents, which is important when importing the data from
   those documents into other applications.
   
   XML is also fully SGML-compatible. Because XML documents are readable
   by SGML software, organizations with an investment in SGML can use XML
   right away.
   
   However, since XML is a subset of SGML, it can't read all SGML
   documents. Ironically, one important SGML language that is not
   XML-compatible is HTML. Fortunately, only minor changes are needed to
   make an HTML document compatible with XML.
   
   Organizations can use XML to ease the exchange of information between
   disparate applications. For example, the Chemical Markup Language is
   an XML-compatible markup language with specific extensions for
   describing molecules and compounds. Using the DTD for that language, a
   developer could create a filter to import data points from a Web page
   into a proprietary chemical modeling application.
   
   Developers also will be able to create clients that are more
   intelligent. An XML client, for example, could sort the part
   manufacturer's data by make, model or year--or show the user only the
   portion of the data pertaining to his or her model of car.
   
   In addition, XML will make intelligent agents easier to design and
   deploy. Today, agent software has to jump through hoops to recognize
   the right data points on constantly changing Web pages. With XML,
   relevant data points can be marked with their own tags (such as
   <price>, for example), so they're easy to find.
   
   Finally, XML includes hypertext features that are currently missing
   from the Web. Such features as bidirectional and location-independent
   links and "transclusion" (where a linked document appears as part of
   the current page) will be possible using XML.
   
   For more information on XML, visit the W3C's Web site at
   www.w3.org/pub/WWW/MarkUp/SGML/Activity.
   
   Copyright � 1997, Information Access Company. All rights reserved. 

========================================================================

   Designing Web sites for non-human audiences.
   
   PC Week, April 28, 1997 v14 n17 p38(1)
   
   Author
          
   Sullivan, Eamonn
          
   Summary
   
   Web pages can be used not only as a direct end-user interface but to
   link one application with another. Future Web sites will be browsed by
   intelligent software agents, which provide automatic information
   retrieval, as much or more as by human beings. Such electronic
   conduits are sensible when there is a lot of information to retrieve
   or it changes frequently because Web pages can be generated on the fly
   and impose few compatibility issues. The inherent limitations of HTML,
   which can only represent certain types of data, are problematic, and
   overcoming the fact that HTML focuses almost exclusively on visual
   information is the focus of numerous development efforts. There are
   already several products that bring sophisticated parsing engines to
   the Web and can find and automatically recognize data in fast-changing
   pages. The upcoming Extended Markup Language (XML) standard lets
   content providers make their intentions far more explicit.
   
   Full Text
   
   In the last couple of years, web site developers have focused,
   rightly, on identifying and serving their audience. But that task is
   going to get more difficult in the next year or so as the definition
   of an "audience" gets stretched out of recognition.
   
   Some of your most important readers, for example, won't even be human.
   They'll be automated agents or simpler programs designed to import
   information from Web pages into another application. Even now, some
   Web sites are serving pages in which the needs of a human reader are
   secondary, at best: large tables or lists that are read almost
   exclusively by an application running at a customer or partner site.
   
   There are now several applications that automate information retrieval
   from Web pages, including WebMethods Inc.'s Web Automation Toolkit
   (www.webmethods.com), AgentSoft Ltd.'s LiveAgent (www.agentsoft.com)
   and OnDisplay Inc.'s CenterStage (www.ondisplay.com). On the provider
   side, one of the main motivations behind the development of XML was to
   make Web pages easier for programs to verify and parse.
   
   Using Web pages as the conduit between applications makes a lot of
   sense, especially in an extranet between you and your customers and
   partners. For the small stuff, using FTP to transfer spreadsheets in
   some readily importable format is good enough, but that method becomes
   quickly impractical when the data becomes voluminous or if it changes
   frequently. Web pages, in contrast, can be generated on the fly,
   secured using any of several methods, and impose very few application
   compatibility requirements.
   
   One problem with using HTML, however, is its limited ability to
   represent data. HTML is an implementation of SGML, and was designed to
   communicate a document's structure and semantic content, but its
   evolution since then has focused almost exclusively on communicating a
   document's visual appearance. If you want to communicate anything
   beyond "this is a headline" or "this is a list," you're on your own.
   
   Much effort is focused on overcoming that limitation. Products such as
   CenterStage include sophisticated parsing engines, designed to find
   and automatically recognize data in frequently changing pages that are
   designed for the human eye. The parsing engines also try to recognize
   the data type.
   
   XML takes a more logical approach. The information provider, after
   all, knows the most about the data. Rather than designing complex
   programs that try to figure out what the provider meant, XML lets the
   provider make its intention explicit. For example, CenterStage goes to
   great lengths to find a price on an HTML page. With XML, the provider
   can just create and use a tag called "<price>."
   
   Looked at this way, XML is simply a natural extension of current
   practice. Many of you probably mark off critical data with HTML
   comments. The comments are invisible to browsers but can be used to
   mark critical sections and data so that they can be more efficiently
   updated programmatically. What XML adds to this practice is
   documentation, using DTDs (Document Type Descriptions) that other
   sites can read and use as the basis for automating information
   retrieval.
   
   XML also makes it easier to design sites that are useful for human
   readers and applications. With explicit tags, you can rearrange your
   pages without worrying about breaking programs on the other end. But
   whether you end up using XML or just document your HTML comments, a
   well-designed, machine-readable Web site is easier to use--and that's
   always the ultimate goal.
   
   What do you think the chances are that XML will catch on? Contact me
   at esullivan@zd.com.
   
   Copyright � 1997, Information Access Company. All rights reserved.
Received on Sunday, 4 May 1997 13:57:14 UTC