- From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
- Date: Sun, 4 May 1997 10:56:45 -0700
- To: w3c-sgml-wg@w3.org
Here are a couple of articles that ought to wash out the bad taste of that clueless Dvorak piece. Jon ======================================================================== XML will take the Web to the next level. PC Week, April 28, 1997 v14 n17 p46(1) Author Sullivan, Eamonn Summary The new Extensible Markup Language (XML) attempts to overcome the limitations of HTML by providing the flexibility needed to deploy more sophisticated documents and exchange very complex data over the Web. XML is a simplified version of Standard Generalized Markup Language (SGML), a 'metalanguage' that can describe other markup languages such as HTML. SGML could theoretically be used for Web browsers because its Document Type Descriptions (DTD) make it very extensible, but its complexity would add too much overhead to everyday Web software and DTDs are difficult to write. XML is a compromise between HTML's simplicity and SGML's power. It creates simplified DTDs and lets authors create new tags, some of them very complex, at will. The new language is fully SGML-compatible and can be read by advanced document-management software, although it cannot read all SGML documents. XML adds bidirectional hypertext and other features currently missing from the Web. Full Text Labs explores enabling technologies of next-generation markup language Many companies have jumped wholeheartedly into the Web, only to find that deploying a large Web site is as complex as developing a large application--and that HTML is not up to the task. It's akin to trying to develop an operating system in BASIC. The Extensible Markup Language, or XML, is the World Wide Web Consortium's answer to the limitations of HTML. It is an extremely flexible language that will enable organizations to deploy more sophisticated documents and exchange complex data via the Web. The XML specification was released at the Sixth International World Wide Web Conference in Santa Clara, Calif., earlier this month. Several software vendors, including Microsoft Corp. and Netscape Communications Corp., have already endorsed it. What is XML? XML is a simplified version of SGML (Standard Generalized Markup Language). To understand what XML is, and what it's good for, it's necessary to understand SGML. SGML, an international standard that predates the Web, is actually a "metalanguage," a language for describing document markup languages. For example, HTML is a markup language that can be described by SGML. To use an SGML editor to create Web pages, an author would first have to supply the editor with a description of HTML. That description (written in SGML) is called a DTD (Document Type Description). SGML also enables organizations to exchange data. For example, an auto parts manufacturer could use SGML to create a markup language for its parts documentation. The language might include tags such as <make>, <model> and <year>. The manufacturer could then distribute the DTD to its distributors, who would use it to create applications that search for the custom tags and extract the information automatically. Theoretically, we could be using SGML browsers to surf the Web. This would give Web authors tremendous flexibility: If HTML didn't have the features needed for a given set of documents, authors could create extensions to HTML and attach a DTD to their documents. But Web browsers are not designed that way, because SGML is simply too complex. Writing a DTD is difficult, as is writing applications that can accurately decipher them. In other words, the complexity of fully implementing SGML outweighs its benefits. XML to the rescue XML was designed as a compromise between the simplicity of HTML and the flexibility of SGML. Like SGML, XML is a metalanguage, but it's easier to use and creates simpler DTDs. Using XML, authors can create new tags at will, even very complex ones. They also can use XML DTDs to validate the structure of large numbers of documents, which is important when importing the data from those documents into other applications. XML is also fully SGML-compatible. Because XML documents are readable by SGML software, organizations with an investment in SGML can use XML right away. However, since XML is a subset of SGML, it can't read all SGML documents. Ironically, one important SGML language that is not XML-compatible is HTML. Fortunately, only minor changes are needed to make an HTML document compatible with XML. Organizations can use XML to ease the exchange of information between disparate applications. For example, the Chemical Markup Language is an XML-compatible markup language with specific extensions for describing molecules and compounds. Using the DTD for that language, a developer could create a filter to import data points from a Web page into a proprietary chemical modeling application. Developers also will be able to create clients that are more intelligent. An XML client, for example, could sort the part manufacturer's data by make, model or year--or show the user only the portion of the data pertaining to his or her model of car. In addition, XML will make intelligent agents easier to design and deploy. Today, agent software has to jump through hoops to recognize the right data points on constantly changing Web pages. With XML, relevant data points can be marked with their own tags (such as <price>, for example), so they're easy to find. Finally, XML includes hypertext features that are currently missing from the Web. Such features as bidirectional and location-independent links and "transclusion" (where a linked document appears as part of the current page) will be possible using XML. For more information on XML, visit the W3C's Web site at www.w3.org/pub/WWW/MarkUp/SGML/Activity. Copyright © 1997, Information Access Company. All rights reserved. ======================================================================== Designing Web sites for non-human audiences. PC Week, April 28, 1997 v14 n17 p38(1) Author Sullivan, Eamonn Summary Web pages can be used not only as a direct end-user interface but to link one application with another. Future Web sites will be browsed by intelligent software agents, which provide automatic information retrieval, as much or more as by human beings. Such electronic conduits are sensible when there is a lot of information to retrieve or it changes frequently because Web pages can be generated on the fly and impose few compatibility issues. The inherent limitations of HTML, which can only represent certain types of data, are problematic, and overcoming the fact that HTML focuses almost exclusively on visual information is the focus of numerous development efforts. There are already several products that bring sophisticated parsing engines to the Web and can find and automatically recognize data in fast-changing pages. The upcoming Extended Markup Language (XML) standard lets content providers make their intentions far more explicit. Full Text In the last couple of years, web site developers have focused, rightly, on identifying and serving their audience. But that task is going to get more difficult in the next year or so as the definition of an "audience" gets stretched out of recognition. Some of your most important readers, for example, won't even be human. They'll be automated agents or simpler programs designed to import information from Web pages into another application. Even now, some Web sites are serving pages in which the needs of a human reader are secondary, at best: large tables or lists that are read almost exclusively by an application running at a customer or partner site. There are now several applications that automate information retrieval from Web pages, including WebMethods Inc.'s Web Automation Toolkit (www.webmethods.com), AgentSoft Ltd.'s LiveAgent (www.agentsoft.com) and OnDisplay Inc.'s CenterStage (www.ondisplay.com). On the provider side, one of the main motivations behind the development of XML was to make Web pages easier for programs to verify and parse. Using Web pages as the conduit between applications makes a lot of sense, especially in an extranet between you and your customers and partners. For the small stuff, using FTP to transfer spreadsheets in some readily importable format is good enough, but that method becomes quickly impractical when the data becomes voluminous or if it changes frequently. Web pages, in contrast, can be generated on the fly, secured using any of several methods, and impose very few application compatibility requirements. One problem with using HTML, however, is its limited ability to represent data. HTML is an implementation of SGML, and was designed to communicate a document's structure and semantic content, but its evolution since then has focused almost exclusively on communicating a document's visual appearance. If you want to communicate anything beyond "this is a headline" or "this is a list," you're on your own. Much effort is focused on overcoming that limitation. Products such as CenterStage include sophisticated parsing engines, designed to find and automatically recognize data in frequently changing pages that are designed for the human eye. The parsing engines also try to recognize the data type. XML takes a more logical approach. The information provider, after all, knows the most about the data. Rather than designing complex programs that try to figure out what the provider meant, XML lets the provider make its intention explicit. For example, CenterStage goes to great lengths to find a price on an HTML page. With XML, the provider can just create and use a tag called "<price>." Looked at this way, XML is simply a natural extension of current practice. Many of you probably mark off critical data with HTML comments. The comments are invisible to browsers but can be used to mark critical sections and data so that they can be more efficiently updated programmatically. What XML adds to this practice is documentation, using DTDs (Document Type Descriptions) that other sites can read and use as the basis for automating information retrieval. XML also makes it easier to design sites that are useful for human readers and applications. With explicit tags, you can rearrange your pages without worrying about breaking programs on the other end. But whether you end up using XML or just document your HTML comments, a well-designed, machine-readable Web site is easier to use--and that's always the ultimate goal. What do you think the chances are that XML will catch on? Contact me at esullivan@zd.com. Copyright © 1997, Information Access Company. All rights reserved.
Received on Sunday, 4 May 1997 13:57:14 UTC