- From: Dan Connolly <connolly@w3.org>
- Date: Tue, 21 Mar 2000 12:39:40 -0600
- To: www-rdf-interest@w3.org
- Message-ID: <38D7C1EC.2062AB7E@w3.org>
Summary: I believe that one of the best ways to transition into RDF, if not a long-term deployment strategy for RDF, is to manage the information in human-consumable form (XHTML) annotated with just enough info to extract the RDF statements that the human info is intended to convey. In other words: using a relational database or some sort of native RDF data store, and spitting out HTML dynamically, is a lot of infrastructure to operate and probably not worth it for lots of interesting cases. We all know that we have to produce a human-readable version of the thing... why not use that as the primary source? Details... During the design of the key/keyref/unique stuff in the XML Schema WG[1,2] (base on the match/use stuff in XSLT[3]), a lightbulb went on in my head about the relationship of XML trees to relational tables: the idiom: <key name="pNumKey"> <selector>..per-row xpath..</selector> <field>..xpath to find 1st field in the row..</field> <field>..xpath for 2nd field..</field> <field>..xpaht for 3rd row..</field> </key> extracts a relational table from an XML tree: you get one row in the table for each node matched by the selector, and one field in that row for each node matched by a field xpath, using the row node as the context. Cool, huh? Then, exploiting the fact that RDF and relational tables are pretty much isomorphic[4], it occured to me that we can use this idiom to extract RDF data from "real world"[5] stuff: meeting records (attendee lists, decisions, actions), issue lists, maybe even hypermail archive indexes, etc. And, of course, to take this home to where I live, the W3C tech reports index[6]. Last night, I finally managed to get enough development tools installed etc. to do some XSLT hacking. I developed a transformation from the /TR/ page[6] into RDF statements about dublin core metadata. It's attached in full, but the gist of it is: ===== <template match="h:dl/h:dt[./h:b/h:i]"> <element name="rdf:Description"> <attribute name="about"><value-of select=".//h:a/@href"/></attribute> <dc:title><value-of select=".//h:a"/></dc:title> <dc:date><value-of select="substring-before(following-sibling::h:dd, ',')"/></dc:date> </element> </template> ===== i.e. find all the dt's in dl's that have b and i in them, and spit out an RDF description of the tech report, giving the dublin core title and date. (oh... I had to xhtml-ize the /TR/ page first... tidy[7] to the rescue!) The results is: === <?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:h="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/DC" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description about="ATAG10"> <dc:title>Authoring Tool Accessibility Guidelines 1.0</dc:title> <dc:date>3 February 2000</dc:date> </rdf:Description> ... </rdf:RDF> === I need some XSLT functions for relativizing and absolutizing URIs... but that should be an easy hack. Anyway... SIRPAC seemed to agree that it was conforming RDF. There are a couple ideas I want to persue: Idea 1: semantic HTML Take the information in my XSLT stylesheet, which says something about the semantics implied by the XHTML stuff, and put it in the XHTML in the first place. Something like: <html xmlns="http://www.w3.org/1999/xhtml"> <head> ... </head> <body> ... <dl> <dt><b><i>spec title...</i></b></dt> <dd>1 Mar 2000, Fred and Bob</dd> ... </dl> </body> <legend xmlns="http://www.w3.org/2000/03/xml-kb/#" xmlns:dc="http://purl.org/DC/"> <each select="h:dl/h:dt[./h:b/h:i]"> <asserts subjectRef=".//h:a/@href" predicateName="dc:title" objectLit=".//h:a" /> <asserts subjectRef=".//h:a/@href" predicateName="dc:date" objectLit="substring-before(following-sibling::h:dd, ',')" /> </foreach> </legend> </html> The <legend/> stuff might seem more natural inside the <head>, but for performance reasons, I think it's better to put it at the end. hmm... I guess there are syntactic details... pointing to resources by QName vs. URI, literals vs. URIs, anonymous nodes, etc. But I hope you get the idea. The beauty of it is: you only need an RDF priest to set up the <legend> in the first place. After that, anybody with basic word processing skills (well... ok... a little more than that) can maintain the <dl> with regular old HTML tools (well... ok... XHTML tools that I expect are just around the corner-- all they have to do is (a) maintain xml well-formedness and (b) leave foriegn elements alone. I think something like hotmetal or XED or emacs/vi/notepad would work fine). And it's not a matter of post-hoc, 3rd party interpretation of the HTML, the way most screen-scraping is done. These semantics are 1st party assertions. They can be digitally signed, managed, copied around, versioned, etc. without jumping through hoops. XSLT has its warts, but it works and it's getting deployed. To me, it almost makes me wonder "why bother?" regarding a new RDF syntax. I think I can generalize my explict template matchine XSLT script into a general purpose <legend> processing script. Hmm... maybe not... maybe I would need to do two XSLT transforms: one from <lenend> into a concrete XSLT transform, then another one to extract the RDF data. I'll have to noodle on it some more... Idea 2: the paper trail Have any of you seen timbl's notes on the paper trail[8]? The idea is, for example: given last month's credit card statement and a set of transaction receipts, produce a new statement. Or: given the current version of the W3C tech reports index, and an approved request to publish, produce the new version of the tech reports index... if this publication replaces an existing one, elide the old one. Or: given a calendar and an appointment request, (a) check for existing conflicts, and (b) generate the new calendar. In general: given state N and a transaction log, produce state N+1. Reminds me of the M3 stableDB thingy[9], or qddb[10], or lots of other similar hacks. The general idea here is that I think it's more effecient and robust to store the XHTML representation of the W3C tech reports index and serve it out of the filesystem than to generate it out of a database dynamically. But I do want database-ish integrity. I suppose this could be done just with XSLT, but I suspect you'll get more bang-for-your-buck if you -- extract RDF using XSLT (which allows you to merge from multiple sources without thinking hard) -- manipulate the RDF in a prolog-ish way (hmm... XSLT implementations in java tend to be easily extensible... and I'm sure there are plenty of logic programming libraries in Java... I suppose we could write these manipulations right into (extended) XSLT scripts!) -- convert the result back to HTML [1] 3.9 Identity-constraint Definition Details in the structures spec http://www.w3.org/TR/xmlschema-1/#Identity-constraint_Definition_details [2] 4.2 Defining Keys and their References in the primer http://www.w3.org/TR/xmlschema-0/#specifying Keys&theirRefs [3] 12.2 Keys in XSLT http://www.w3.org/TR/xslt#key [4] Yang, Thu, 9 Mar 2000 11:59:12 -0800 http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0074.html [5] RDF in the real world Stallion, Jason (Cahners) (Mon, Mar 13 2000) http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/thread.html [6] W3C Technical Reports and Publications http://www.w3.org/TR/ [7] Clean up your Web pages with HTML TIDY http://www.w3.org/People/Raggett/tidy/ [8] TimBL, Feb 1999 http://www.w3.org/DesignIssues/PaperTrail.html [9] Stable.ig in the Modula 3 library source http://www.research.digital.com/SRC/m3sources/html/stable/src/Stable.ig.html [10] The Official Qddb Home Page http://www.hsdi.com/qddb/ -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/DC" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <output method="xml" indent="yes"/> <template match="h:html"> <rdf:RDF> <apply-templates/> </rdf:RDF> </template> <template match="h:dl/h:dt[./h:b/h:i]"> <element name="rdf:Description"> <attribute name="about"><value-of select=".//h:a/@href"/></attribute> <dc:title><value-of select=".//h:a"/></dc:title> <dc:date><value-of select="substring-before(following-sibling::h:dd, ',')"/></dc:date> </element> </template> <!-- don't pass text thru --> <template match="text()|@*"> </template> </stylesheet>
Received on Tuesday, 21 March 2000 13:42:25 UTC