W3C home > Mailing lists > Public > www-rdf-interest@w3.org > December 2000

Screen Scraping SW Logic from XHTML

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sat, 30 Dec 2000 00:26:16 -0000
Message-ID: <029a01c071f7$206318c0$b7d993c3@z5n9x1>
To: <www-rdf-interest@w3.org>, <www-rdf-logic@w3.org>
Screen scraping RDF classes and properties from XHTML is great, but it got
me wondering if you could scrape actual logical assertions from it? In
other words, can we create inference rules in XHTML and then transform them
into useful RDF, and if so, how? O.K., the first question is how do we
represent this (for example):-

     "it is not true that soap.com is a member and IBM is
     not a member", or:
     <not>
         <w3c:member>http://www.ibm.com/</w3c:member>
        <not>
            <w3c:member>http://www.soap.com/</w3c:member>
         </not>
     </not> - TimBL, DesignIssues, Semantic Web Toolbox [1]

in XHTML??? We could do most of the stuff in prose, but it doesn't make for
transformable code...:-

<h1>The Following is False</h1>
<p class="false">IBM is a member of W3C and Soap isn't</p>

ugh..., so if we define a simple class type; "false" for the <not>; and
"members" for the <w3c:member>, we end up with:-

<h1>False Assertion</h1>
<div class="false">
 <h2>W3C Members</h2>
    <dl class="members">
         <dt>Members: </dt>
          <dd>http://www.ibm.com/ </dd>
         <dt>Non-members</dt>
       <div class="false">
          <dd>http://www.soap.com/ </dd>
       </div>
    </dl>
</div>

Right, that says that "it is not true that soap.com is a member and IBM is
not a member" in both prose and transformable MarkUp. I put this up at
http://infomesh.net/2000/12/swhacking/logic.html for people to run tests
on. Note: Each <dd> point that is a child element of the
dl[@class='members'] takes on the same class attribute, for ease of
transformation. Anyway, here is my own stab at transforming this into RDF
via. XSLT:-

<stylesheet
    xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:logic="http://www.w3.org/DesignIssues/Toolbox.html"
    xmlns:w3c="http://www.w3.org/Member/"
    xmlns:html="http://www.w3.org/1999/xhtml"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
<output method="xml" indent="yes"/>
<template match="html:html">
 <rdf:RDF>
   <rdf:Description rdf:ID="logic">
     <apply-templates/>
   </rdf:Description>
 </rdf:RDF>
</template>
<template match="html:div[@class='false']">
  <logic:not>
   <apply-templates match="html:dl[@class='members']"/>
  </logic:not>
</template>
<template match="html:dl[@class='members']">
<for-each select="html:dd">
  <w3c:member>
   <value-of select="{.}" />
  </w3c:member>
</for-each>
<for-each select="html:div[@class=false]/html:dd">
 <logic:not>
  <w3c:member>
   <value-of select="{.}"/>
  </w3c:member>
 </logic:not>
</for-each>
</template>
<template match="text()|@*">
</template>
</stylesheet>

I tried running it through both of the W3C XSLT transformation services
that I know of [2], [3], but it didn't seem to like it. I'm not sure if
that's because my code is buggy (i.e. complete rubbish), or because the
servers are down (more likely then not: both). If someone could have a look
at it, that would be great.

Anyway, the point is that because all logic is reified at some point, there
are prose descriptions for all basic levels of machine processable
ontologies. Therefore, it must be possible to have an annotated form of
these in the form of XHTML, that can then be transformed into its
equivalent RDF. I think I've proved that it *is* possible, but is it
useful?

[1] http://www.w3.org/DesignIssues/Toolbox.html
[2] http://www.w3.org/2000/06/dc-extract/form.html
[3] http://www23.w3.org/servlet/org.w3c.test.XSLTapache

Kindest Regards,
Sean B. Palmer
http://infomesh.net/sbp/
http://www.w3.org/WAI/ [ERT/GL/PF]
"Perhaps, but let's not get bogged down in semantics."
   - Homer J. Simpson, BABF07.
Received on Friday, 29 December 2000 19:25:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:47 GMT