- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Sat, 2 Dec 2000 17:00:52 -0000
- To: <xml-dev@lists.xml.org>
- Cc: <swi-dev@egroups.com>, <www-rdf-interest@w3.org>, "William Loughborough" <love26@gorge.net>, "Tapio Markula" <tapio1@gamma.nic.fi>
Documents are here to stay, and so is data. Roughly put, we have HTML/WWW for documents, and XML/RDF/SW for data. The problem we all face is that it is very rare to have either a pure document or pure data. Documents always have data to back them up, and consequently data always needs some kind of prose explanation. Look upon this as "explicit reification" if you must: everything needs a prose definition at some level. Does this mean the SW has failed before it has started? Of course not! It will work for pure data models, but there aren't all that many pure data models out there..the information we mainly deal with is simply annotated data. At the moment it appears that we have a mini formatting war [1] going on for documents vs. data, and the ongoing battles about XML Schema vs. XML DTDs (or put a bit more rationally XML vs. XHTML). But why can't we just come to a sort of half document half data consensus? [[[ I believe that one of the best ways to transition into RDF, if not a long-term deployment strategy for RDF, is to manage the information in human-consumable form (XHTML) annotated with just enough info to extract the RDF statements that the human info is intended to convey. [...] We all know that we have to produce a human-readable version of the thing... why not use that as the primary source? ]]] - [2] Or in other words, using XHTML [3] as a repository for data, but one that can still be marked up with annotations, explanations, and summaries...aha! The key concepts we have here is the following: Data can be stored somehow in XHTML, and annotated with two different types of further data - annotation intended to facilitate the machine transformation and extraction of that data into machine (RDF?) form, and annotation to assist humans in the interpretation of that data [4]. The two most important building blocks for this conversation will be these simple little tags and attributes (their meanings are self-explanatory):- <annotation xmlns="[TBD]"> <inverseOf xmlns="http://www.daml.org/2000/10/daml-ont.daml"> @annotation @class @type If we added those simple tags etc. to a kind of XHTML slurry, then we would have a lot more power to walk through the mire 'twixt documents and data. But this is all an abstract conversation isn't it? Not really. Browsers worldwide grok XHTML, and a few can use CSS to style other forms of XML. At the moment, to cleanly extract data from XHTML, we have to pepper it (i.e. annotate it) with hundreds of "classes" - class attributes [5] to imply our meaning, for example as discussed in the semantic design principles [6], and so instead we could just add a few custom based annotation and logic based tags (like the ones above) to (e.g.) m12n, and create a transformable form of XHTML, to bridge the gap. Strangely enough, the W3C's Amaya already has an annotation system [7], and an annotation server [8]. But it doesn't tie into the document at all, and therefore I doubt it has any usage at all (sorry!). However, the principle of using annotations with data is a great idea, and one that surely should be pursued. Summary:- We need some kind of "lingua franca" to annotate data in such a form so as to be human readable, and transformable into machine readable format. (And yes, this does have smackings of SDF [9]). There aren't many examples of semantically annotated XHTML out there (in fact, I can't ifnd one satisfactory one...) so I urge people to create examples. References:- [1] http://doctypes.org/ - Doctypes.org, M. Altheim [2] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0103.html - XSLT for screen-scraping RDF out of real-world data, Dan Connolly [3] http://www.w3.org/TR/xhtml1/ - XHTML 1.0, Steven Pemberton et al. [4] http://www.mysterylights.com/sbp/#docordata - Documents vs. Data, Sean B. Palmer [5] http://www.w3.org/TR/html401/struct/global.html#adef-class - The class Attribute - HTML 4.01, Dave Raggett et al. [6] http://www.mysterylights.com/sbp/#semanticprinciples - Design Principles to Aid Semantics, Sean B. Palmer [7] http://www.w3.org/2000/02/collaboration/annotation/AmayaDocs/Annotation.html - Annotations in Amaya [8] http://annotest.w3.org/ - The W3Cs Annotea project [9] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Nov/0033.html - Semantic Document Frameworks, Sean B. Palmer P.S. Apologies for the cross post: this note (i.e. rant) covers quite a few topics... Kindest Regards, Sean B. Palmer http://www.mysterylights.com/sbp/ http://www.w3.org/WAI/ [ERT/GL/PF] "Perhaps, but let's not get bogged down in semantics." - Homer J. Simpson, BABF07.
Received on Saturday, 2 December 2000 12:01:02 UTC