- From: Jean Paoli <jeanpa@microsoft.com>
- Date: Sun, 22 Jun 1997 22:37:56 -0700
- To: "'w3c-sgml-wg@w3.org'" <w3c-sgml-wg@w3.org>, "'xml-dev@ic.ac.uk'" <xml-dev@ic.ac.uk>, "'w3c-sgml-erb@hpsgml.fc.hp.com'" <w3c-sgml-erb@hpsgml.fc.hp.com>
- Cc: Andrew Layman <andrewl@microsoft.com>, Thomas Reardon <thomasre@microsoft.com>, Adam Bosworth <adamb@microsoft.com>, Hadi Partovi <hadip@microsoft.com>
I am pleased to present XML-Data, a Position Paper from Microsoft. XML-Data is an application of XML for exchanging structured data and metadata on the Internet. This position paper is sent to multiple working groups in the W3C dealing with this subject (XML, meta-data) and we expect this paper to be discussed and improved by these working groups. The current proposal needs namespaces and uses the Layman/Bray proposal. The URL of this paper (on the Microsoft site) will be posted tomorrow. -Jean Paoli ---------------- <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Template" content="C:\MSOffice\Templates\Letters & Faxes\VFPSPEC97.dot"> <meta name="GENERATOR" content="Microsoft FrontPage 2.0"> <title>XML-Data</title> </head> <body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink="#FF0000"> <p align="right"><font size="4"><b>XML-Data.html</b></font> </p> <p><font size="4"><b>Position Paper from Microsoft<br>20 June 1997 </b></font> </p> <h1 align="center">XML-Data</h1> <dl> <dt>Authors: </dt> <dd><a href="mailto:andrewl@microsoft.com">Andrew Layman</a>, Microsoft Corporation<br> <a href="mailto:jeanpa@microsoft.com">Jean Paoli</a>, Microsoft Corporation<br> <a href="mailto:sjd@eps.inso.com"><font size="3">Steve De Rose</font></a><font size="3">, Inso Corporation</font><br> <a href="mailto:ht@cogsci.ed.ac.uk">Henry S. Thompson</a>, University of Edinburgh <br> </dd> <dt>Acknowledgements:</dt> <dd><font size="3">We thank </font><a href="mailto:paul@arbortext.com"><font size="3">Paul Grosso</font></a><font size="3"> (Arbortext), </font><a href="mailto:sca@eps.inso.com"><font size="3">Sharon Adler</font></a><font size="3"> (Inso Corporation), </font><a href="mailto:alb@eps.inso.com"><font size="3">Anders Berglund</font></a><font size="3"> (Inso Corporation), </font><a href="mailto:fcha@ais.Berger-Levrault.fr">François Chahuneau</a> (AIS/Berger-Levrault),<font color="#0000FF" size="2" face="Arial"> </font><font size="3">and </font><a href="mailto:edwardj@microsoft.com"><font size="3">Edward Jung</font></a><font size="3"> (Microsoft) for their help and contributions to this proposal.</font></dd> </dl> <p>Copyright (c) 1997 Microsoft Corp. <br> </p> <hr> <h2 align="left">Abstract</h2> <p align="left">This document provides the specification for exchanging structured and networked data on the Web. This specification uses XML, the Extensible Markup Language for describing data as well as data about data. We expect this specification to be useful for a wide range of applications such as describing database transfers, digital signatures or remotely-located web resources.</p> <h2 align="left">1. Introduction</h2> <p><font color="#000000" size="3">The Internet holds the potential to integrate all information in a global network (with many private but integrated domains). The Internet promises access to information any time and, with wireless technology, anywhere. Today, however, the Internet is merely an <i>access medium </i>to text and pictures. To actualize the Internet's potential, we need to add intelligent search, data exchange, adaptive presentation, and personalization. The Internet must go beyond setting an information <em>access</em> standard, and must set an information <i>understanding </i>standard, which means: a standard way of representing data so that software can better search, move, display, and otherwise manipulate information currently hidden in contextual obscurity.</font></p> <p><font color="#000000" size="3">XML is an important step in this direction. It offers a standard syntax for textual structure of tagged data, based on extensive industry and theoretical experience. Its lexical format easily depicts a tree structure. A tree is a natural format that is richer than a simple flat list, yet (compared to a generalized graph) also respectful of cognitive and data processing requirements for economy and simplicity. </font></p> <p><font color="#000000" size="3">Looking at this point in more detail, there are several ways of structuring data. One is a flat tagging system. In this system, sets of keywords are applied to data elements. This is a simple form of data structure, but it does not capture any relationships between the keywords.</font></p> <p><font color="#000000" size="3">A more advanced means of structuring information is a tree. A tree allows expression of subsumption, containment, or any other single (contextual) relationship such as "manages." Trees correspond to object-oriented class hierarchies, file system hierarchies, organizational hierarchies and so forth. Trees are relatively easy to understand and to construct. Trees are efficient to process, and there is a linear (<em>e.g.</em> textual) structure that a program can parse incrementally, and determine when it is finished. This makes trees particularly useful as a transmission format for asynchronous, distributed systems such as the Internet, and also for display purposes where the single relationship (usually visual containment) enables incremental display.</font></p> <p><font color="#000000" size="3">A still more elaborate structure is a directed graph. A graph allows expression of arbitrary binary relationships, that is, many relationships between two things. A graph can express subsumption, containment, and any number of other relationships simultaneously. It is therefore a superset of a tree. This makes graphs very expressive for real-world semantics, but it also makes them harder to understand, more difficult to construct, and less efficient to process than trees. There is no efficient linear (<em>e.g.</em> textual) structure of a graph that can be incrementally processed. Therefore, while they are particularly useful for representing (and instrumenting) the complete semantics of a system, they are typically not suitable for transmission, display, or immediate processing.</font></p> <p><font color="#000000" size="3">The tree structure is proved broadly implementable and easy to deploy, not just in theory but also widely in practice. Industrial implementations, in the SGML community and elsewhere, demonstrate its intrinsic quality and industrial strength, e.g. aircraft (ATA), automotive (J2008), banking (OFX), and semiconductors (Pinnacles PCIS).</font></p> <p><font color="#000000" size="3">This proposal shows how to add a single convention to XML so that graph arcs are easily added into a lexical tree structure, without requiring decomposition of tree format into a "lowest common denominator" nodes-and-arcs structure. (For a quick look at the difference, see the </font><a href="#XML-Data-vs-MCF"><font color="#000000" size="3">XML-Data versus MCF in XML comparison</font></a><font color="#000000" size="3">.)</font></p> <p><font color="#000000" size="3">XML-Data consists of a collection of related technologies. First, it unifies lexical trees with graph structures. Second, it builds on this to define a representation for schemata based on XML instance syntax. It offers a mechanism to organize element types into a hierarchy, and proposes a small set of basic types. Finally, it adds facilities for lexical typing and proposes a small collection of lexical types.</font></p> <p><font color="#000000" size="3">XML-Data can encode the content, semantics and schemata for a gamut of cases, from simple and prosaic to complex and sophisticated:</font></p> <ul> <li><font color="#000000" size="3">An ordinary document</font></li> <li><font color="#000000" size="3">A structured record, such as a appointment record or purchase order</font></li> <li><font color="#000000" size="3">An object, with data and methods</font></li> <li><font color="#000000" size="3">A data record, such as the result set of a query</font></li> <li><font color="#000000" size="3">Information in a database or a web site (<em>e.g. </em>CDF)</font></li> <li><font color="#000000" size="3">Graphical presentation (<em>e.g.</em> an application user interface)</font></li> <li><font color="#000000" size="3">Upper ontology (standard schema entities and types)</font></li> <li><font color="#000000" size="3">UberWeb (all the links between information and people on the web)</font></li> </ul> <p><font color="#000000" size="3">The resulting flexibility of a single homogenous data representation system allows any reader to uniformly determine the structural semantics of a data element. Information can then be reused for new purposes and in novel contexts. For example, a record from a database of restaurants and a record from a client contact database might be reused in the context of an appointment, say in setting a lunch date with a client. The relationships between the restaurant and contact data do not reside in the schema data described by either database individually, but are extensions defined by the instance of the appointment.</font></p> <p><font color="#000000" size="3">This proposal, building on the earlier <em>Web Collections in XML </em>proposal, shows how to use a single syntax for a broad range of data, using that syntax for data and schemata, permitting the expressiveness of graph data when such power is required, but retaining the benefits of lexical trees.</font></p> <h2 align="left">2. Examples of XML-Data</h2> <h3><font size="4" face="Times New Roman"><code>Data</code></font></h3> <p><font size="4" face="Times New Roman"><code>The following example shows a simple order from a bookstore for several books, a record, and a cup of coffee.</code></font></p> <pre><code><ORDER> <SOLD-TO> <PERSON><LASTNAME><strong>Layman</strong></PERSON> <FIRSTNAME><strong>Andrew</strong></FIRSTNAME> </PERSON> </SOLD-TO> <SOLD-ON><strong>19970317</strong></SOLD-ON> <ITEM> <PRICE><strong>5.95</strong></PRICE> <BOOK> <TITLE><strong>Number, the Language of Science</strong></TITLE> <AUTHOR><strong>Dantzig, Tobias</strong></AUTHOR> </BOOK> </ITEM> <ITEM> <PRICE><strong>12.95</strong></PRICE> <BOOK> <TITLE><strong>Introduction to Objectivist Epistemology</strong></TITLE> <AUTHOR><strong>Rand, Ayn</strong></AUTHOR> </BOOK> </ITEM> <ITEM> <PRICE><strong>12.95</strong></PRICE> <RECORD> <TITLE><COMPOSER><strong>Tchaikovsky's</strong></COMPOSER ><strong> First Piano Concerto</strong></TITLE> <ARTIST>><strong>Janos</strong></ARTIST> </RECORD> </ITEM> <ITEM> <PRICE><strong>1.50</strong></PRICE> <COFFEE> <SIZE><strong>small</strong></SIZE> <STYLE><strong>cafe macchiato</strong></STYLE> </COFFEE> </ITEM> </ORDER></code></pre> <p><font size="4" face="Times New Roman"><code>XML-Data is flexible enough to encode heterogeneous structures, for example books, records and coffee all within one sales order. These different kinds of items do not need to all have the same internal parts. For example, books have titles, coffee generally doesn't. XML-Data allows values to be expressed as element content (for example the book titles shown) or with a <em>value</em> attribute (for example the author and artist elements). Properties of elements can be expressed as attributes (e.g. size and style of coffee) or as sub-elements (e.g. author, artist). XML-Data can appear in separate documents or within other documents (such as HTML pages).</code></font></p> <h3><font size="4" face="Times New Roman"><code>Data about Other Data</code></font></h3> <p><font size="4" face="Times New Roman"><code>XML-Data is suitable for complex, self-contained data structures such as the book order, and also for information such as the </code></font><a href="http://www.microsoft.com/standards/cdf-f.htm"><code>Channel Definition Format</code></a><code>, </code><font size="4" face="Times New Roman"><code>which describes remotely-located web resources, many of which are themselves data:</code></font></p> <pre><code><CHANNEL> <ITEM HREF="<strong>http://www.zoosports.com/intro.htm</strong>" level="<strong>2</strong>" precache="<strong>NO</strong>"> <A HREF="<strong>http://www.zoosports.com/page1.htm</strong>"> <strong>This is a link to page 1.</strong></A> <TITLE><strong>Welcome to ZooSports!</strong></TITLE> <ABSTRACT><strong>ZooSports articles, news, and promotional offers</strong></ABSTRACT> </ITEM> <SCHEDULE ENDDATE="<strong>1994-11-05</strong>"> <INTERVALTIME DAY="<strong>1</strong>"/> <EARLIESTTIME HOUR="<strong>12</strong>"/> <LATESTTIME HOUR="<strong>18</strong>"/> </SCHEDULE> </CHANNEL></code></pre> <h3><font size="4" face="Times New Roman"><code>PICS-NG Labels</code></font></h3> <p><font size="4" face="Times New Roman"><code>XML-Data can express PICS-NG Labels</code></font><font size="5" face="Times New Roman"><code>:</code></font></p> <p><font size="4" face="Times New Roman"><code>(This uses the </code></font><a href="http://www.w3.org/XML/Group/9705/namespace.htm"><font size="4" face="Times New Roman"><code>Layman-Bray proposal for namespaces</code></font></a><font size="4" face="Times New Roman"><code>.)</code></font></p> <pre><code><xml> <xml:schema> <namespaceDcl href="<strong>http://purl.org/Schemas</strong>" name="<strong>purl</strong>"/> <namespaceDcl href="<strong>http://www.foo.com</strong>" name="<strong>foo</strong>"/> </xml:schema> <xml:data> <purl:description1 href="<strong>http://purl.color.org/document.html</strong>"> ; <title><strong>Light and Dark: A study of color</strong></title> <subject><LCSH> <for><strong>Color and Color Palettes</strong></for></LCSH> </subject> <author> <foo:author> <name><strong>John Smith</strong></name> <affiliation><strong>thedarkside</strong></affiliation> <email><strong>john@thedarkside</strong></email></foo:aut hor> <foo:author> <name><strong>Smith, Jane Q.</strong></name> <affiliation><strong>thelightregion</strong></affiliation> <email><strong>jane@thelightregion</strong></email></foo: author></purl:description1> </xml:data> </xml></code></pre> <h3><font size="4" face="Times New Roman"><code>Digital Signatures, Security &Authentication</code></font></h3> <p><font size="4" face="Times New Roman"><code>Returning to the bookstore example, this is the same order with a digital signature added. The structured nature of XML-Data makes it easy to sign whole elements or parts of them.</code></font></p> <pre><code><ORDER> <dsig:DSIG> <MANIFEST>><strong>80183589575795589189518915</strong></MANIFEST > <SIG href="<strong>http://XYX/Joe@company.com</strong>"/> </dsig:DSIG> <SOLD-TO> <PERSON><LASTNAME>><strong>Layman</strong></PERSO> <FIRSTNAME>><strong>Andrew</strong></FIRSTNAME> </PERSON> </SOLD-TO> <SOLD-ON>><strong>19970317</SOL</strong>> <ITEM> <PRICE><strong>5.95</strong></PRICE> <BOOK> <TITLE><strong>Number, the Language of Science</strong></TITLE> <AUTHOR><strong>Dantzig, Tobias</strong></AUTHOR> </BOOK> </ITEM> <ITEM> <PRICE><strong>12.95</strong></PRICE> <BOOK> <TITLE><strong>Introduction to Objectivist Epistemology</strong></TITLE> <AUTHOR><strong>Rand, Ayn</strong></AUTHOR> </BOOK> </ITEM> <ITEM> <PRICE><strong>12.95</strong></PRICE> <RECORD> <TITLE><COMPOSER><strong>Tchaikovsky's</strong></COMPOSER ><strong> First Piano Concerto</strong></TITLE> <ARTIST>><strong>Janos</strong></ARTIST> </RECORD> </ITEM> <ITEM> <PRICE><strong>1.50</strong></PRICE> <COFFEE> <SIZE><strong>small</strong></SIZE> <STYLE><strong>cafe macchiato</strong></STYLE> </COFFEE> </ITEM> </ORDER></code></pre> <h3><font size="4" face="Times New Roman"><code>Database Information</code></font></h3> <p><font size="4" face="Times New Roman"><code>While XML-Data can represent complex structures, it can also represent simple ones, for example a simple list of database records:</code></font></p> <pre><code><BOOK-MASTER-LIST> <BOOK id="book1"> <TITLE><strong>Number, the Language of Science</strong></TITLE> <AUTHOR>><strong>Dantzig, Tobias</strong></AUTHOR> </BOOK> <BOOK id="book2"> <TITLE><strong>Introduction to Objectivist Epistemology</strong></TITLE> <AUTHOR>><strong>Rand, Ayn</strong></AUTHOR> </BOOK> <BOOK id="book3"> <TITLE><strong>I, The Jury</strong></TITLE> <AUTHOR>><strong>Spillane, Mickey</strong></AUTHOR> </BOOK> <BOOK id="book4"> <TITLE><strong>Half Magic</strong></TITLE> <AUTHOR>><strong>Eager, Edward</strong></AUTHOR> </BOOK> <BOOK id="book5"> <TITLE><strong>QED</strong></TITLE> <AUTHOR>><strong>Feynmann, Richard P.</strong></AUTHOR> </BOOK> <BOOK-MASTER-LIST></code></pre> <h3><font size="4" face="Times New Roman"><code>Graph Structures</code></font></h3> <p><font size="4" face="Times New Roman"><code>An XML-Data element may include links to resources outside the immediate tree. When it meets application needs, this <em>href</em> facility can be used to break up a single structure into multiple parts, with relations among them indicated by Universal Resource Identifier (URI) links. The references can be local or remote. In this example, they are inventory records from the database table we just looked at.</code></font></p> <pre><code><ORDER id="order1"> <dsig:DSIG> <MANIFEST>><strong>80183589575795589189518915</strong></MANIFEST > <SIG href="<strong>http://XYX/Joe@company.com</strong>"/> </dsig:DSIG> <SOLD-TO> <PERSON><LASTNAME>><strong>Layman</strong></PERSO> <FIRSTNAME>><strong>Andrew</strong></FIRSTNAME> </PERSON> </SOLD-TO> <SOLD-ON><strong>19970317<</strong></SOLD-ON> <ITEM href="<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book 1</strong>"> <PRICE>5.95</PRICE> </ITEM> <ITEM href="<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book 2</strong>"> <PRICE>12.95</PRICE> </ITEM> <ITEM href="<strong>http://bigbookstore.com/data/musicmaster?XML-XPTR=cd1 </strong>"> <PRICE>12.95</PRICE> </ITEM> <ITEM> <PRICE>1.50</PRICE> <COFFEE> <SIZE><strong>small</strong></SIZE> <STYLE><strong>cafe macchiato</strong></STYLE> </COFFEE> </ITEM> </ORDER></code></pre> <p><font size="4" face="Times New Roman"><code>Notice that each of the ITEM elements establishes a relationship between the ORDER and a BOOK, and that the <em>relationship itself</em> has attributes, in this case the price at which the book was sold. Relations can have attributes, can contain elements and the process can be carried to any needed level of detail.</code></font></p> <h3><font size="4" face="Times New Roman"><code>Discontiguous Information (propertyOf)</code></font></h3> <p><font size="4" face="Times New Roman"><code>Information about an element can be contained in the element, but also can sit outside it. For example, the following applies a digital signature to a sales order without actually modifying the order:</code></font></p> <pre><code><dsig:DSIG> <xml:propertyOf href="<strong>http://bigbookstore.com/data/orders?XML-XPTR=order1&q uot;/></strong> <MANIFEST ><strong>80183589575795589189518915</strong></MANIFEST> <SIG href="<strong>http://XYX/Joe@company.com</strong>"/> </dsig:DSIG></code></pre> <h3><font size="4" face="Times New Roman"><code>Schema</code></font></h3> <p><font size="4" face="Times New Roman"><code>Every data object, such as a purchase order, contains certain parts, such as sold-to, sold-on date, items, etc. We can write a formal description of what these parts are and which are allowed where. This is called a "schema" and is written using a form of XML-Data:</code></font></p> <pre><code><xml:schema ID="BookOrderSchema"> <!-- This schema is digitally signed. Schemas are a form of data, so they, too, can be signed. --> <dsig:DSIG> <MANIFEST ><strong>*(&#&$&@*$&%*&@*$&$*@</strong></M ANIFEST> <SIG href="<strong>http://XYX/Jane@company.com</strong>"/> </dsig:DSIG> <!-- Here are all the element types, their contents, attributes and relations. --> <elementType id="<strong>ORDER</strong>"> <relation href="<strong>#SOLD-TO</strong>"/> <relation href="<strong>#SOLD-ON</strong>"/> <relation href="<strong>#ITEM</strong>" occurs="<strong>STAR</strong>"/> </elementType> <relationType id="<strong>SOLD-TO</strong>"> <elt href="<strong>#PERSON</strong>"/> </relationType> <relationType id="<strong>SOLD-ON</strong>"> <pcdata/> <!-- Date is YYYYMMDD --> <attribute name="<strong>lextype</strong>" default="<strong>DATE.ISO8061</strong>" presence="<strong>fixed</strong>"/> </relationType> <elementType id="<strong>PERSON</strong>"> <relation href="<strong>#LASTNAME</strong>"/> <relation href="<strong>#FIRSTNAME</strong>"/> </elementType> <elementType id="<strong>LASTNAME</strong>"> <pcdata/> </elementType> <elementType id="<strong>FIRSTNAME</strong>"> <pcdata/> </elementType> <relationType id="<strong>PRICE</strong>"> <pcdata/> </relationType> <relationType id="<strong>ITEM</strong>"> <any/> <relation href="<strong>#PRICE</strong>"/> <range href="<strong>#BOOK</strong>"/> <range href="<strong>#RECORD</strong>"/> <range href="<strong>#COFFEE</strong>"/> </relationType> <elementType id="<strong>BOOK</strong>"> <relation href="<strong>#TITLE</strong>"/> <relation href="<strong>#AUTHOR</strong>"/> </elementType> <elementType id="<strong>RECORD</strong>"> <relation href="<strong>#TITLE</strong>"/> <relation href="<strong>#ARTIST</strong>"/> </elementType> <relationType id="<strong>SIZE</strong>"> <pcdata/> </relationType> <relationType id="<strong>STYLE</strong>"> <pcdata/> </relationType> <elementType id="<strong>COFFEE</strong>"> <relation href="<strong>#SIZE</strong>"/> <relation href="<strong>#STYLE</strong>"/> </elementType> <elementType id="<strong>TITLE</strong>"> <mixed><elt href="<strong>#COMPOSER</strong>"/></mixed> </elementType> <relationType id="<strong>AUTHOR</strong>"> <pcdata/> </relationType> <relationType id="<strong>ARTIST</strong>"> <pcdata/> </relationType> <relationType id="<strong>COMPOSER</strong>"> <pcdata/> </relationType> </xml:schema></code></pre> <h3><font size="4" face="Times New Roman"><code>Type Extension</code></font></h3> <p><font size="4" face="Times New Roman"><code>Sometimes some elements are variants of others, in which case we can organize the element types into a genus-species hierarchy using the <em>extends</em> attribute:</code></font></p> <pre><code><xml:schema ID="<strong>ArtSchema</strong>"> <elementType id="<strong>artistic-work</strong>"> <relation href="<strong>#TITLE</strong>"/> </elementType> <elementType id="<strong>BOOK</strong>" extends="<strong>#artistic-work</strong>"> <relation href="<strong>#AUTHOR</strong>"/> </elementType> <elementType id="<strong>RECORD</strong>" extends="<strong>#artistic-work</strong>"> <relation href="<strong>#ARTIST</strong>"/> <relation href="<strong>#COMPOSER</strong>" occurs="<strong>OPTIONAL</strong>"/> </elementType> <relationType id="<strong>AUTHOR</strong>"> <pcdata/> </relationType> <relationType id="<strong>COMPOSER</strong>" extends="<strong>#AUTHOR</strong>"/> <relationType id="<strong>ARTIST</strong>"> <pcdata/> </relationType> </xml:schema></code></pre> <p><font size="4" face="Times New Roman"><code>Here we see that books and records are both types of artistic work, and that a composer is a type of author.</code></font></p> <h3><font size="4" face="Times New Roman"><code>Schema Extension</code></font></h3> <p><font size="4" face="Times New Roman"><code>We can use also use this ability to customize a schema that has useful features, but which is too general. In this example, we show a general schema for orders, then another one that is customized for our bookstore:</code></font></p> <pre><code><xml:schema ID="<strong>GenericOrderSchema</strong>"> <elementType id="<strong>ORDER</strong>"> <relation href="<strong>#SOLD-TO</strong>"/> <relation href="<strong>#SOLD-ON</strong>"/> </elementType> <relationType id="<strong>SOLD-TO</strong>"> <elt href="<strong>#PERSON</strong>"/> </relationType> <elementType id="<strong>PERSON</strong>"> <relation href="<strong>#LASTNAME</strong>"/> <relation href="<strong>#FIRSTNAME</strong>"/> </elementType> <relationType id="<strong>LASTNAME</strong>"> <pcdata/> </relationType> <relationType id="<strong>FIRSTNAME</strong>"> <pcdata/> </relationType> </xml:schema> <xml:schema id="BookOrderSchema"> <elementType id="<strong>ORDER</strong>" extends="<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER) </strong>"> <relation href="<strong>#ITEM</strong>" occurs="<strong>STAR</strong>"/> </elementType> <relationType id="<strong>ITEM</strong>"> <any/> <relation href="<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER)</s trong>"/> <range href="<strong>http://art.com/schemata?XML-XPTR=ID(BOOK)</strong>&qu ot;/> <range href="<strong>http://art.com/schemata?XML-XPTR=ID(RECORD)</strong>& quot;/> <range href="<strong>#COFFEE</strong>"/> </relationType> <relationType id="<strong>SIZE</strong>"> <pcdata/> </relationType> <relationType id="<strong>STYLE</strong>"> <pcdata/> </relationType> <elementType id="<strong>COFFEE</strong>"> <relation href="<strong>#SIZE</strong>"/> <relation href="<strong>#STYLE</strong>"/> </elementType> </xml:schema></code></pre> <h2 align="left">3. XML-Data Schema</h2> <p align="left">The XML-Data schema language defines element types, attributes, relations, and which of these can be used in which combinations with others. It also provides features for organizing element types into a genus-species hierarchy, a basic set of element types, and a small set of lexical types. The schema contains other features from XML Document Type Definition (DTD) language, such as entity and notation declarations. The XML-Data schema is powerful enough to express the same structural information and constraints as XML DTDs. It covers all the features of XML-DTDs. An XML DTD can be mechanically converted to an XML-Data schema. </p> <p>Schemata are composed of principally of declarations for: </p> <ul> <li>element types, represented by <i>elementType</i></li> <li>attributes of elements, represented by attribute</li> <li>relations<em> </em>among elements, represented by <em>relationType</em></li> <li>rules governing the valid combinations of the above, represented by <em>any, mixed </em>and<em> pcdata; </em>also by<em> ent</em>, <em>group</em>, <em>relation, </em>and<em> range.</em>.</li> <li>internal and external entities, represented by <i>intEntityDecl</i> and <i>extEntityDecl</i></li> <li>notations, represented by <i>notationDcl</i></li> </ul> <p>Comments can be interspersed as usual in XML, and there is provision for using references to external schemata or schema fragments.</p> <h3><b>3.1. The schema document element type: </b><b><i>schema</i></b> </h3> <p>All schema elements are contained within a schema element, like this:</p> <pre><code><?XML version='1.0' rmd='all'?> <!doctype schema SYSTEM "http://www.w3c.org/pub/sotr/schema.dtd"> <xml:schema id='ExampleSchema'> <!-- schema goes here. --> </xml:schema></code></pre> <h3><b>3.2. The element type declaration element type: elementType</b> </h3> <p><em>Key terms used here:</em> <strong>element, elementType, empty, any, mixed, pcdata</strong>, <strong>content model.</strong></p> <p>The heart of an XML-Data schema is the <strong>elementType</strong> declaration which defines a class of elements, gives them attributes, establishes a grammar of which other element types and character data are allowed in their contents and defines their allowable relationships to elements of other classes. (The allowable content, including relations, is called "content model.")</p> <pre><code><elementType id="example"> <!-- element example (p*) --> <elt href="#p" occurs="STAR"/> </elementType> <elementType id="p"> <!-- element p ((#PCDATA|p)*) --> <mixed><elt href="#p"/></mixed> </elementType></code></pre> <p>The name attribute is optional if id is present, in which case the id is used as the name.</p> <p>Within an elementType, <em>elt</em> indicates that instances are permitted to only have a single element type in their content. The <em>occurs</em> attribute of <em>elt</em> specifies whether this content is optional, and gives its cardinality. </p> <p><em>Empty</em> and <em>any</em> content are expressed using predefined elements <em>empty</em> and <em>any</em>. (<em>Empty</em> may be omitted. <em>Any</em> signals that any mixture of elements and parsed character data is legal.) Parsed character data content is similarly expressed with a <em>pcdata</em> item. <em>Mixed</em> content (a mixture of parsed character data and one or more element types), is identified by a <em>mixed</em> element, whose content identifies the element types allowed in addition to parsed character data (see below). </p> <pre><code><elementType id="ARTIST"> <pcdata/> </elementType></code></pre> <p>More complex content models are created using <em>group</em>:</p> <pre><elementType id="animalFriends" > <group groupType="OR" occurs="STAR"> <group groupType="OR" occurs="PLUS"> <elt href="#cat"/> <elt href="#dog"/> </group> <elt href="#bird"/> <elt href="#rabbit"/> <elt href="#pig"/> <elt href="#fish"/> </group> </elementType></pre> <h3>3.3 Relations</h3> <p><em>Key terms used here:</em> <strong>relationType, relation, XML-Link locator, href.</strong></p> <p><em>Relation</em> element types express a relationship between one element (usually the relation's parent) and either another element or an atomic value (such as a simple number, string or date). Relations use the XML-Link <em>locator</em> without implying navigation. The target of a relation is the element referenced by the <em>href</em> attribute if one is present, else the element contents. This single convention unifies graphs and trees.</p> <p>Including a relation in an elementType makes it an implicit part of that element's content model, with the default for occurs being OPTIONAL. Relations must occur (in a valid document instance) after any other content. RelationsTypes are elements, and the full content model is as if there were a sequential group containing first the explicitly provided content model, then the relations in a <em>starred</em> <em>or</em> group with all the relations as content. </p> <p>Two element types are used in the schema to effect a relation: The <em>relationType</em> is a specialized kind of <em>elementType</em>, while <em>relation</em> has the same function as <em>elt </em>( but validates that it refers to a relationType). </p> <p>If a <em>default</em> attribute is specified for a relation, it becomes the default of the <em>value</em> attribute of the relation elt. The <em>range</em> element, if present, declares a restriction on the valid target of a relation. Each range element references one elementType; any of which are valid. </p> <pre><code> <relationType id="favoriteFood" ><mixed/></relationType> <relationType id="chases" ><any/></relationType> <elementType id="dog" > <any/> <attribute name="name"/> <relation href="favoriteFood"/> <relation href="chases"/> </elementType></code></pre> <h3>3.4 Attributes</h3> <p><em>Key terms used here:</em> <strong>attribute, attribute, values, default. </strong></p> <p>After the content model, attribute declarations may occur, which are divided into attributes with enumerated or notation values, and all other kinds.</p> <pre><code><elementType id="p1"> <!-- element p1 ((#PCDATA|p1)*) --> <mixed><elt href="#p"/></mixed> <attribute name='id' type='ID'/> <!-- attlist p id ID=#IMPLIED exm (a|b|c) 'c' x CDATA FIXED 'y' --> <attribute name='exm' type='ENUMERATION' values='a b c'default='c'/> <attribute name='x' defType='FIXED' default='y'/> </elementType></code></pre> <p>An attribute may be given a <em>default</em> value. Whether it is required or optional is signaled by <i>presence</i>. (Presence ordinarily defaults to IMPLIED, but if omitted and there is an explicit default, <i>presence</i> is set to the SPECIFIED.)</p> <p>Attributes with enumerated (and notation) values permit a <em>values</em> attribute, a space-separated list of legal values.. The <em>values</em> attribute is required when the <em>type</em> is ENUMERATION or NOTATION,<em> </em>else it is forbidden. In these cases, if a default is specified it must be one of the specified values.</p> <p>Similar to the facility of multiple ATTLISTs, we sometimes need to have <em>attributesDcls</em> declared separately from the elementType they refer to. We can do this with the <em>propertyOf</em> element, discussed later.</p> <h3><b>3.5 The internal and external entity declaration element type: </b><b><i>intEntityDcl</i></b> and <b><i>extEntityDcl</i></b></h3> <p><em>Key terms used here:</em> <strong>entity, internal entity, external entity, notation.</strong></p> <p>This and the next two declarations cover <em>entities</em> in general. Entities are a powerful shorthand mechanism, similar to macros in a programming language.</p> <pre><code><intEntityDcl name="LTG"> <entityDef>Language Technology Group</entityDef> </intEntityDcl></code></pre> <pre><code><extEntityDcl name="dilbert"> <notation href="#gif"/> <systemId href="http://www.ltg.ed.ac.uk/~ht/dilb.gif"/> </extEntityDcl></code></pre> <p>Here as elsewhere, following XML, <em>systemId</em> must be a URL, absolute or relative, and <em>publicId</em>, if present, must be a Public Identifier as defined in ISO/IEC 9070:1991, Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers.. If a <em>notation</em> is given, it must be declared (see below) and the entity will be treated as binary, i.e., not substituted directly in place of references.</p> <pre><code><notationDcl name="gif"> <systemId href='http://who.knows.where/'/> </notationDcl></code></pre> <h3><b>3.6. The external declarations element type: </b><b><i>extDcls</i></b> </h3> <p><em>Key terms used here:</em> <strong>external entity with declarations.</strong></p> <p>Although we allow an external entity with declarations to be included, we recommend a different declaration for schema modularization. The <em>extDcls</em> declaration gives a clean mechanism for importing (fragments of) other schemata. It replaces the common SGML idiom of declaring an external parameter entity and then immediately referring to it, and has the same import, namely, that the text referred to by the combination of <b>systemId</b> and <b>publicId</b> is included in the schema in place of the <b>extDcls</b> element, and that replacement text is then subject to the same validity constraints and interpretation as the rest of the schema.</p> <h3>3.7. Type Extension</h3> <p><em>Key terms used here:</em> <strong>type (class), typeOf, extension (inheritance, subclassing), implements, extends, typeOf (genus).</strong></p> <p>Schema of all types can benefit from a subtyping mechanism: indicating that one class of object is a specialization of another more general class. For example, cat and dog both have the type <em>pet</em> as their more general category. To make more effective use of such classes, we introduce one new schema attribute, which can be used to declare explicitly that an element type is a subclass of another: <em>extends</em>: </p> <pre><code><xml:schema> <elementType id="animalFriends" > <elt href="#pet" occurs="PLUS" /> </elementType> <elementType id="pet" > <any/> </elementType> <elementType id="cat" extends="#pet"/> <elementType id="dog" extends="#pet"/> </xml:schema></code></pre> <p>This schema says that the <em>animalFriends</em> element class can contain one or more elements from the <em>pet</em> class, such as a <em>cat</em> or a <em>dog</em>. Also, that each cat and dog instance is a pet (<font size="3">that is, any cat is semantically a pet, and any valid cat is also a valid pet</font>). So the following data is now valid under this schema: </p> <pre><code><animalFriends> <cat/> <dog/> <cat/> </animalFriends></code></pre> <h4>Type Extension</h4> <p>It is frequently necessary to <em>add</em> new attributes to a subclass. This requires no extra machinery, because XML already permits multiple attribute list declarations, which cumulatively add attributes to element types. So each subclass may easily add any new attributes desired, as shown here: </p> <pre><code><elementType id="dog" extends="#pet"/> <attribute name="age"/> </elementType></code></pre> <p>If the super type has content model, (attributes, etc.) these are inherited, that is, they are also declared implicitly for the derived class. In the following example, we give an <em>owner</em> attribute to <em>pet</em>. This are inherited, so both <em>cat</em> and <em>dog</em> now also now have an <em>owner</em> attribute..</p> <pre><code><xml:schema> <elementType id="animalFriends" > <elt href="#pet" occurs="PLUS" /> </elementType> <elementType id="pet"> <any/> <attribute id='name'/> <attribute id='owner'/> </elementType> <elementType id="cat" extends="#pet"/> <elt href='#kittens'/> <attribute id='lives' type='NMTOKEN'/> </elementType> <elementType id="dog" extends="#pet"/> <elt href='#puppies'/> <attribute id='breed'/> </elementType> <xml:schema></code></pre> <p>This schema says that the animalFriends element class can contain one or more <em>pet</em> elements. Because <em>cat</em> and <em>dog</em> are subtypes of <em>pet</em>, they can occur as well. So the following instance fragment is now valid under this schema: </p> <pre><code><animalFriends> <cat name="Fluffy" lives='9'/> <pet name="Diego"/> <dog name="Gromit" owner='Wallace' breed='mutt'/> </animalFriends></code></pre> <p>Additional relations can also be added, but only be added if the content model of the superType consists of a single list of optional, repeatable element types.</p> <p>When defining a derived element class, one can also override existing attributes and relations. The following example adds a <em>Height</em> relation and overrides the <em>favoriteFood</em> relation, giving it a default value of "Fish." (We also do something fancy here. Making this overridden element itself have its super type favoriteFood ensures that the derived element is in all other respects identical.) </p> <pre><code><relationType id="height"> <any/> </relationType> <relationType id="#favoriteCatFood" extends="#favoriteFood"/> <elementType id="cat" extends="#pet"/> <relation href="#height"/> <relation href="#favoriteCatFood" default="Fish"/> </elementType></code></pre> <h4>Schema Extension</h4> <p>We can also use subtyping to extend an existing schema without editing it. Suppose that we cannot edit the schema defining pet, cat or dog, but want to use elements with those names and semantics in our document. The following adds the "eyeColor" property to <em>cat</em>.</p> <pre><code><relationType id="eyeColor" extends="http://whereever.org/#eyeColor"> <pcdata/> </relationType> <elementType id="cat" extends="http://whereever.org/#cat"/> <relation href="#eyeColor"/> </elementType></code></pre> <p>The rules for allowable subtyping must enforce certain constraints, which are in principle that a subtype can have additional relations and attributes (provided this is consistent with the super type's content model, but never fewer) and can add restrictions (but never relax them). In practice, this principle leads to rules such as that default values can be added if there are none, changed, or converted to FIXED if DEFAULT.</p> <h4>Implements</h4> <p>Subtyping as we have described it here is actually a combination of two effects: First, we assert that an element of one type is also of another (as in a cat is a pet).</p> <p>Second, we achieve economies and maintainability in the declarations to make sure that the first is true. That is, the derived element class is automatically provided with all the properties of the super type. Sometimes it is valuable to have the first effect without the second. (This is equivalent to the Java <em>implements</em> facility.) We indicate this by using the <em>implements</em> element, as in </p> <pre><code><relationType id="favoriteFood" > <mixed/> </relationType> <relationType id="weight" > <mixed/> </relationType> <elementType id="cat" > <implements href="http://whereever.org/#pet" /> <attribute name="name"/> <relation href="#favoriteFood" /> <relation href="#weight" /> </elementType<em>></em></code></pre> <p><font size="3">This has no effect on the attributes or relations of instances of cat, but asserts in the schema that every cat is also a pet (that is, any cat is semantically a pet, and any valid cat is also a valid pet).</font></p> <h4>Relation of Type Extension to Parameter Entities</h4> <p>Sophisticated DTDs often make complex use of <em>parameter entities</em> in an attempt to consolidate common structures in one, reusable place. Such parameter entities often represent implicit classes.</p> <p>The need is real, but the approach often leads to obscurity, and reduced maintainability. Further, expansion of entities loses all connection with their source: once expanded, the fact that some set of element types was a co-declared set, re-used in multiple places, is lost. </p> <h3>3.8 Lexical Data Types</h3> <p>Information such as dates and numbers is often expressed in a format that requires some further parsing. For example, the same date can be written "October 22, 1954" or "19541022". (And from what I've seen, about 300 other ways.) The <em>lextype</em> attribute discriminates formats. Appearing on instance elements, it describes the format of the remainder of the element. The value of the lextype attribute is always by reference to a URI identifying the parsing rules. XML-Data should define a small number of these. We propose NUMBER, INTEGER, REAL and DATE.ISO8061.</p> <pre><code><birthday lextype="<strong>DATE.ISO8061</strong>"><strong>19541022</s trong></birthday></code></pre> <p><font size="4" face="Times New Roman"><code>These are declared in the schema as follows:</code></font></p> <pre><code><relationType id="<strong>birthday</strong>"> <attribute name="<strong>lextype</strong>" default="<strong>DATE.ISO8061</strong>" presence="<strong>fixed</strong>"/> </relationType></code></pre> <p><font size="4" face="Times New Roman"><code>When giving the lexical type of an <em>attribute</em> in the schema, <em>lextypeIs</em> is used, as in:</code></font></p> <pre><code><attribute name="<strong>price</strong>" presence="<strong>REQUIRED</strong>" lextypeIs="<strong>number</strong>"/></code></pre> <p>Some patterns will indicate that several properties or attributes should be used in combination to arrive at a value. For example, a custom pattern could indicate a date expressed as the following: </p> <pre><code><relationType id="<strong>birthday</strong>"> <attribute name="lextype" default="<strong>DATE.ATTR-YMD</strong>" presence="<strong>specified</strong>"/> </relationType> ... <birthday year="<strong>1954</strong>" month="<strong>10</strong>" day="<strong>22</strong>" > </code></pre> <h3>3.9. Basic Semantic Data Types</h3> <p>We need to define here a small number of basic types and their hierarchy, corresponding to simple data types such as Number and Date. (Dates are a subtype of numbers.) </p> <p>We also need to define the expression of each of the basic Java and SQL data types in terms of these basic ones, plus additional properties giving units, precision, min, max, default pattern, and other properties. For example, an INTEGER typically is a number a certain min and max property values. Note that units should be an element type with possible structure, so that things like "miles/hours" or "feet/(sec*sec)" can be represented and used for automatic conversions.</p> <h2 align="left">4. Standard Vocabulary</h2> <p align="left">We expect standard libraries of vocabulary to be developed to capture common semantic used in vertical applications and particularly in industry and application domains. Dublin Core and CDF are two examples of such standard libraries.</p> <h2 align="left">5. Relations to other proposed standards</h2> <p align="left"><font size="3">The W3C site at</font><font size="4"> </font><a href="http://www.w3.org/PICS/Member/NG/"><font color="#0000FF" size="3"><u>http://www.w3.org/PICS/Member/NG</u></font></a><font color="#0000FF" size="3"><u> </u></font><font color="#000000" size="3">contains links to several related papers, including Ora Lassila's </font><a href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h tml"><font color="#000000" size="3">PICS-NG document</font></a><font color="#000000" size="3">, Renato Ianella's small PICS extension proposal, CDF, MCF in XML, the </font><a href="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html"><font color="#000000" size="3">Web Collections using XML</font></a><font color="#000000" size="3"> proposal. Specific notes on some of these follow:</font></p> <h3>5.1 XML-LINK</h3> <p>All relations use <em>href</em> in a manner consistent with <a href="http://www.w3.org/pub/WWW/TR/WD-xml-link-970406.html">XML-LINK</a> working draft dated April 6, 1997 (the most recent as of the time of this writing). XML-Links are a type of <em>relation</em> (with extra attributes, elements, and semantics indicating traversal).</p> <h3>5.2 PICS-NG</h3> <p><a href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h tml#intro">PICS-NG Metadata Model and Label Syntax</a> describes a set of requirements for structured data to be used on the Internet. XML-Data is an application of XML concepts to those requirements.</p> <h3>5.3 CDF</h3> <p><font size="3">The </font><a href="http://www.microsoft.com/standards/cdf-f.htm"><font size="3">Channel Definition Format</font></a><font size="3"> (CDF) is a natural application of XML-Data and is fully compatible with the syntax and the ideas presented in this document</font>. Its format is a validatable grammar given a proper schema. The existing use of href in CDF is consistent with XML-LINK and XML-Data usage. CDF defines a number of basic element types that would be appropriate for a standard library.</p> <h3>5.4 MCF in XML</h3> <p><a href="http://www.w3.org/Member/9706/xmlmcf.htm">MCF in XML</a> has two principal components: The ability to represent a "directed labeled graph" and also a set of predefined element types. The first of these is effected by a convention on use of the <em>href</em> attribute (the same convention used in XML-Data <em>relations</em>, with the same effect). Of the second, some element types are genuinely necessary to represent schemata and a type system (these are also present in XML-Data) while others would be appropriate for a standard library.</p> <p>XML-Data has a number of features not in MCF: </p> <ul> <li>Principally, XML-Data permits <strong>tree structures</strong> in cases when MCF only permits a graph. (MCF requires that the target of all relations must be out-of-line when it is an element. XML-Data allows in-line targets.) </li> <li>XML-Data hrefs are explicitly <strong>URI</strong>s. (Though MCF <em>unit</em>s can be URIs, it is not clear from the current document when they are and when they are not.)</li> <li>Finally, names in XML-Data were chosen for more compatibility with <strong>existing XML usage</strong> (or at least that is the intention).</li> <li>XML-Data schemata can represent all the information in an XML <strong>DTD</strong>, while it is not clear that MCF can do this. </li> <li>XML-Data has additional capabilities for expressing <strong>relationships in the schema</strong> (relation, relationType, extends, implements). </li> <li>XML-Data proposes <em><strong>lextypes</strong></em> as a basic element type, a feature not discussed in MCF. </li> </ul> <p>This chart tabulates the MCF "bootstrap" element types and describes their equivalence in XML-Data</p> <dl> <dt>Category</dt> <dd>"elementType" in XML-Data.</dd> <dt>typeOf</dt> <dd>"typeOf" relation in XML-Data. Also,"extends" and "implements" in XML-Data assert the relationship in the schema. </dd> <dt>Unit</dt> <dd>"href" in XML-Data.</dd> <dt>domain</dt> <dd>"propertyOf" in XML-Data.</dd> <dt>range</dt> <dd>"range" in XML-Data. This gives the allowed type of the target of a property.</dd> <dt>superType</dt> <dd>This may correspond to "implements" XML Data. However the MCF document is not clear on this point.</dd> <dt>Property</dt> <dd>This corresponds to the abstract concept of a link class expressed in schemata by <em>relation</em> and <em>relationType</em>.. </dd> <dt>FunctionalProperty</dt> <dd>This appears to be a <em>relation</em> with <em>occurs</em> = OPTIONAL or REQUIRED (that is, occurs at most once).</dd> <dt>mutuallyDisjoint</dt> <dd>This is a relationship asserted among the members of an enumeration. XML-Data does not contain a predefined propertyType for this. It could be added easily if this is useful. </dd> <dt>parent</dt> <dd>A generic property, whose meaning appears to be contextual. XML-Data does not contain a predefined elementType for this. It is unneeded because parentage is expressed by containment, while when out-of-line, specific meanings are conveyed by more precise relationship types such as <em>propertyOf</em>.</dd> <dt>name</dt> <dd>"name" in XML-Data. However, note that like parent, the interpretation of name in MCF seems to be contextual.</dd> <dt>description</dt> <dd>XML-Data does not contain a predefined elementType for this. We think that this belongs to a standard library and not in this specification.</dd> <dt>Sequence</dt> <dd>This is a special arc type in MCF that expresses the same fact as lexical order in XML.</dd> <dt>ord</dt> <dd>This is a MCF helper element type for Sequence.</dd> </dl> <p><a name="XML-Data-vs-MCF">Comparative examples of XML-Data and MCF in XML</a> representation of an order for several books. (All persons in this example are assumed to be not in the document, but elsewhere.) The <em>id</em> attribute is on all elements representing real-world objects, in both models. In the MCF model <em>id</em> also appears on elements needed artificially for reference. </p> <table border="0"> <tr> <td><font size="4">MCF in XML</font></td> <td><font size="4">XML-Data</font></td> </tr> <tr> <td valign="top"><pre><code> <ORDER id="order1"> <SOLD-TO unit="<strong>http:/people#person1</strong>"/> <SOLD-ON value="<strong>19970317</strong>"/> <ITEMS unit="<strong>sequence1</strong>"/> </ORDER> <BOOK id="book1"> <TITLE value="<strong>Number, the Language of Science</strong>"/> <AUTHOR unit="<strong>http:/people#person2</strong>"/> </BOOK> <SEQUENCE id="sequence1"> <ORD UNIT="book1"> <PRICE value=<strong>"5.95"</strong>/> </ORD> <ORD UNIT="cd1"> <PRICE value=<strong>"12.95"</strong>/> </ORD> <ORD UNIT="book2"> <PRICE value=<strong>"6.95"</strong>/> </ORD> <ORD UNIT="food1"> <PRICE value=<strong>"1.50"</strong>/> </ORD> </SEQUENCE> <COFFEE id="food1"> <size value="<strong>small</strong>"/> <style value="<strong>cafe macchiato</strong>"/> </RECORD> <RECORD id="cd1"> <TITLE value="<strong>Rachmaninoff's Second Piano Concerto</strong>"/> <ARTIST unit="<strong>http:/people#person3</strong>"/> </RECORD> <BOOK id="book2"> <TITLE value="<strong>The Evolution of Complexity</strong>"/> <AUTHOR unit="<strong>http:/people#person4</strong>"/> </BOOK></code></pre> </td> <td valign="top"><pre> <code><ORDER id="order1"> <SOLD-TO href="<strong>http:/people#person1</strong>"/> <SOLD-ON value="<strong>9970317"</strong>/> <ITEM> <PRICE><strong>5.95</strong></PRICE> <BOOK id="book1"> <TITLE ><strong>Number, the Language of Science</strong></TITLE> <AUTHOR href="<strong>http:/people#person2</strong>"/> </BOOK> </ITEM> <ITEM> <PRICE><strong>12.95</strong></PRICE> <RECORD id="cd1"> <TITLE ><strong>Rachmaninoff's Second Piano Concerto</strong></TITLE> <ARTIST href="<strong>http:/people#person3</strong>"/> </RECORD> </ITEM> <ITEM> <PRICE><strong>6.95</strong></PRICE> <BOOK id="book2"> <TITLE ><strong>The Evolution of Complexity</strong></TITLE> <AUTHOR unit="<strong>http:/people#person4</strong>"/> </BOOK> </ITEM> <ITEM> <PRICE><strong>1.50</strong></PRICE> <COFFEE> <SIZE><strong>small</strong></SIZE> <STYLE><strong>cafe macchiato</strong></STYLE> </COFFEE> </ITEM> </ORDER></code></pre> </td> </tr> </table> <p> </p> <h2 align="left">6. Conclusion</h2> <p><font color="#000000" size="3">Future applications of the Internet will focus on adding user value to information through semantic annotation. Semantics will permit information to be discovered, targeted, reused, and integrated. Not only does this make the content more usable, but it opens up opportunities for software developers to build components that exploit these semantics. Such components could include applications as prosaic as application or user logging, or as futuristic as user agents that assist in finding or organizing contents, World-Wide Web "surf buddies" that accompany a user's browsing and adding valuable or entertaining comments, or natural language query systems. Semantic annotation turns the Internet into a platform for programming powerful and valuable applications.</font></p> <p><font color="#000000" size="3">This proposal lays the foundation for how applications can annotate their information content. The proposal adds powerful new constructs for representing semantics, sufficiently advanced for use in artificial intelligence and natural language systems, yet retains the architecture and investment of existing XML and the efficiency of its representation.</font></p> <hr> <h2 align="left">Appendix A - The XML DTD for a schema</h2> <pre><code> <!ENTITY % nodeattrs 'id ID #IMPLIED' > <!-- href is as per XML-LINK, but is not required unless there is no content --> <!ENTITY % exattrs 'extends CDATA #IMPLIED' > <!ENTITY % linkattrs 'id ID #IMPLIED href CDATA #IMPLIED' > <!-- The shared content model of elementType, linkType and relationType --> <!-- Omitted element type same as "empty." --> <!ENTITY % extendedmodel 'implements*, (elt|group|empty|any|pcdata|mixed)?, (relation|attribute)*'> <!-- The top-level container --> <!element schema ((elementType|propertyOf|linkType| relationType|extendType|augmentElementType| intEntityDcl|extEntityDcl| notationDcl|extDcls|c)*)> <!attlist schema %nodeattrs;> <!-- Element Type Declarations --> <!element elementType (%extendedmodel)> <!-- Either name or id must be present - - absent name defaults to id --> <!attlist elementType %nodeattrs; %exattrs; name CDATA #IMPLIED> <!-- Element types allowed in content model --> <!-- Note this is just short for a model group with only one elt in it --> <!element elt EMPTY> <!-- Elements can have exponents as well as groups --> <!-- The href is required --> <!attlist elt %linkattrs; occurs (required|optional|star|plus) 'required'> <!-- A group in a content model, sequential or disjunctive --> <!element group ((group|elt)+)> <!attlist group %nodeattrs; groupType (seq|or) 'seq' occurs (required|optional|plus) 'required'> <!element any EMPTY> <!element empty EMPTY> <!element pcdata EMPTY> <!-- mixed content is just a flat, non-empty list of elts --> <!-- We don't need to say anything about #pcdata, it's implied --> <!element mixed (elt+)> <!attlist mixed %nodeattrs;> <!-- Attributes --> <!-- default value must be present iff presence is specified or fixed --> <!-- presence defaults to specified if default is present, else implied --> <!-- name attribute is locally unique, defaults to id if absent --> <!element attribute empty> <!attlist attribute %linkattrs; name CDATA #IMPLIED type (id|idref|idrefs|entity|entities|nmtoken|nmtokens| enumeration|notation|cdata) 'cdata' default CDATA #IMPLIED values NMTOKENS #IMPLIED presence (implied|specified|required|fixed) #IMPLIED lextypeIs CDATA #IMPLIED> <!-- Relations - - relationTypes are pointed to from relations, just as elementTypes are pointed to from elts --> <!element relationType (%extendedmodel;, range*)> <!attlist relationType %nodeattrs; %exattrs; name CDATA #IMPLIED > <!element range empty > <!attlist range %linkattrs; > <!element relation EMPTY> <!attlist relation %linkattrs; default CDATA #IMPLIED occurs (required|optional|star|plus) 'optional'> <!-- For adding attributes to existing element types --> <!element propertyOf EMPTY> <!attlist propertyOf href CDATA #REQUIRED> </code><font color="#000000" size="3"><!element augmentElementType ((relation|attribute)*)> <!attlist augmentElementType %linkattrs; %</font><code>exattrs</code><font color="#000000" size="3">;></font><code> <!-- Shorthand for simple XML-LINKs --> <!element linkType (%extendedmodel;)> <!attlist linkType %nodeattrs; %exattrs; name CDATA #IMPLIED role CDATA #IMPLIED title CDATA #IMPLIED show (embed|replace|new) #IMPLIED actuate (auto|user) #IMPLIED behaviour CDATA #IMPLIED > </code><font size="4"><code> </code></font><code><!element implements EMPTY> <!attlist implements href CDATA #REQUIRED> <!-- Entity Declarations --> <!-- Note as this is written only external entities can have structure without escaping it --> <!-- Name defaults to id if absent --> <!element intEntityDcl (#PCDATA)> <!attlist intEntityDcl %nodeattrs; name CDATA #IMPLIED> <!-- The entity will be treated as binary if a notation is present --> <!-- systemID and publicId (if present) must have the required syntax --> <!element extEntityDcl ( systemId, publicId?)> <!attlist extEntityDcl %nodeattrs; name CDATA #IMPLIED notation CDATA #IMPLIED> <!-- Pointers for above --> <!element systemID EMPTY> <!attlist systemID %linkattrs;> <!-- Must be empty if href is used --> <!element publicID (#PCDATA) > <!attlist publicID %linkattrs;> <!-- Notation Declarations --> <!-- systemID and publicId (if present) must have the required syntax --> <!element notationDcl (systemId, publicId?)> <!attlist notationDcl %linkattrs; name CDATA #IMPLIED> <!-- External entity with declarations to be included --> <!-- systemID and publicId (if present) must have the required syntax --> <!element extDcls empty> <!attlist extDcls systemId CDATA #REQUIRED publicId CDATA #IMPLIED> <!-- Namespace Declarations --> <!-- systemID and publicId (if present) must have the required syntax --> <!element namespaceDcl EMPTY> <!attlist namespaceDcl %linkattrs; name CDATA #IMPLIED> </code></pre> </body> </html>
Received on Monday, 23 June 1997 01:38:05 UTC