- From: Murray Maloney <murray@muzmo.com>
- Date: Wed, 04 Oct 2006 19:38:36 -0400
- To: public-grddl-wg <public-grddl-wg@w3.org>
- Message-Id: <5.1.1.6.2.20061004175925.02ca4f18@mail.muzmo.com>
Here is an updated version of a proposed Cross-set Introduction to GRDDL. I intend to continue working on this until next week's call. Feel free to offer suggestions for changes in the interim. I have tried to capture as much as possible from previous iterations of introductory material from all the WDs. If you feel that I have failed to address anything that should be mentioned in an introduction, please let me know. Please note that I ask for help with an example of source XHTML, and a potential result RDF. I don't think that we have to demonstrate an actual transformation because we are just trying to illustrate dialects of languages in the source and a nice RDF encoding of the same information. <div> <h2 id="intro">Introduction: Data and Documents</h2> <p>There are many dialects of languages in practice among the many XML documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies. Some offer more formally defined semantics and others more loosely-couple semantics. Recently, two progressive encoding techniques have emerged to overlay additional semantics onto valid XHTML documents: RDF-a and microformats offer simple, open data formats built upon existing and widely adopted standards. </p> <p>While this breadth of expression is quite liberating, inspiring new dialects to codify both common and customized meanings, it can prove to be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same person?</p> <h3>Resource Descriptions</h3> <p>The Resource Description Framework<a href="#RDFC04">[RDFC04]</a> provides a standard for making statements about resources in the form of a subject-predicate-object expression. One way to represent the fact "<I>The Stand<I>'s author is Stephen King" in RDF would be as a triple whose subject is "The Stand," whose predicate is "has the author," and whose object is "Stephen King," The predicate, "has the author" expresses a relationship between the subject (The Stand) and the object (Stephen King). Using URIs to uniquely identify the book, the author and even the relationship would facilitate software design because not everyone knows Stephen King or even spells his name consistently. </p> <PRE> [Here, I would like someone to create an example of a source and a result: Source XHTML includes META and numerous LINK elements that all somehow cite authorship, including dc:author and others. Result RDF includes a tidy package of person/author/book triples which includes one that asserts that "Stephen King"/author/"The Stand" and another which mis-spells it as Steven King.] </PRE> <p>GRDDL is a mechanism for <b>G</b>leaning <b>R</b>esource <b>D</b>escriptions from <b>D</b>ialects of <b>L</b>anguages. That is, GRDDL provides a relatively inexpensive mechanism for bootstrapping RDF content from uniform XML dialects; shifting the burden from formulating RDF to creating transformation algorithms specifically for each dialect. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML. The use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.</p> <p>GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). GRDDL works by associating transformations for an individual document, either through direct inclusion of references or indirectly through profile and namespace documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them. </p> <h3>For example:</h3> <p>Dublin Core meta-data can be written in an HTML dialect<a href="#RFC2731">[RFC2731]</a> that has a clear correspondence to an encoding in RDF/XML<a href="#DCRDF">[DCRDF]</a>. The following HTML and RDF excerpts illustrate the correspondence:</p> <pre class="example"><html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Some Document</title> <meta name="DC.Subject" content="ADAM; Simple Search; Index+; prototype" /> </head> </html></pre> <pre class="example"><rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <rdf:Description rdf:about=""> <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject> </rdf:Description> </rdf:RDF></pre> <p>The correspondence between the source and result forms of this example is expressed as an algorithm in an XSLT transformation, <a href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl">dc-extract.xsl</a>:</p> <h3>Transformations</h3> <p>Transformations are currently commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since all GRDDL processors should be capable of interpreting an XSLT 1.0 document.</p> <p><a href="http://www.w3.org/TR/xproc/">XProc: An XML Pipeline Language,</a> <i>a language for describing operations to be performed on XML documents,</i> has recently been published as a W3C Working Draft. It merits consideration for expressing more complex or sophisticated transformations which require control over the flow of processing through a variety of XML processing tools. Using XProc, one could apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result or an intermediate stage is not valid.</p> <h3>GRDDL WD</h3> <p> This GRDDL Working Draft is a concise technical specification of the GRDDL mechanism and its XML syntax. It specifies the GRDDL syntax to use in valid XHTML and well-formed XML documents, as well as how to encode GRDDL into namespaces and HTML profiles. Discussions of the GRDDL transformation link and security issues are also covered. Appendices provide links to extended examples and existing software and services that employ GRDDL. </p> <h3>GRDDL Primer</h3> <p> A Primer on Gleaning Resource Descriptions from Dialects of Languages (GRDDL) is a progressive tutorial on the GRDDL mechanism. It develops on a number of examples from the GRDDL Use Cases document to illustrate GRDDL techniques for associating documents with transformations for extracting RDF. </p> <h3>GRDDL Use Cases</h3> <p>This document collects a number of use cases together with their goals and requirements for GRDDL (Gleaning Resource Descriptions from Dialects of Languages), a mechanism for getting <a href="#RDF">RDF</a> data out of XML documents and in particular XHTML pages using explicitly associated transformation algorithms. These use cases also illustrate how XML and XHTML documents can be decorated with <a href="#microformats">microformat</a>, <a href="#EmbeddedRDF">Embedded RDF</a> or <a href="#RDFa">RDFa</a> statements to support <a href="#GRDDLTransformation">GRDDL transformations</a> in charge of extracting valuable data that can then be used to automate a variety of tasks.</p> <PRE>The annotated Table of Use Cases would appear here in the Use Cases WD.]</PRE> </div>
Received on Wednesday, 4 October 2006 23:39:07 UTC