[BioRDF] some of our experiences converting text/RDB/XML to RDF

Per our conversation from yesterdays conference call, here are some examples
of our format conversion efforts which include text/RDB/XML to RDF.

DADS (http://www.nist.gov/dads/) is a scholarly work by our Paul Black. It
began in 1995, before we got involved with the semantic web. Its html pages
are derived from a set of tagged text files, one for each term (e.g.,
http://xlinux.sdct.nist.gov/jb/temp/dads-fibonacciTree.txt). Perl scripts
are used to do the conversion. From the DADS pages (e.g.,
http://www.nist.gov/dads/HTML/fibonacciTree.html), you can see that DADS has
a taxonomy of algorithms/data structures, and it also has a "component-of"
hierarchy, i.e., "Aggregate Child" means that the entry is a component of
some other algorithm or data structure.

In collaboration with Lockheed Martin, we created two semantic web forms,
one OWL DL and one not (see http://www.w3.org/TR/swbp-classes-as-values/ for
a discussion of the issue which led to the two forms). Perl scripts were
also used to create the RDF/OWL representation. The semantic web form of
DADS is still a work in progress.

We have experimented with a web interface for the semantic web form of DADS.
This is illustrated in the following screenshots.
http://xlinux.sdct.nist.gov/jb/temp/dads-search-display-tree.jpg shows the
initial page. A keyword for a term name is entered, in this case "Tree". The
search engine responds with all terms with "tree" as part of the name
yielding:
http://xlinux.sdct.nist.gov/jb/temp/dads-search-display-tree-response.jpg
Selecting "FibonacciTree" results in:
http://xlinux.sdct.nist.gov/jb/temp/dads-fib-tree-display.jpg
The treeviews of the taxonomy and the AggregateChild property have been
synched to that term. The web interface was created by saving the Protege
representation of the knowledge base in OWL Database form and querying that
using SQL via a PHP script.

>From text files, we have also converted a number of software flaw taxonomies
to RDF/OWL
(http://samate.nist.gov/SSATTM_Content/ReadmeFiles/taxonomy.html) also using
Perl. The source document for one of these taxonomies is:
http://www.cve.mitre.org/docs/plover/plover-text.txt
You can see that this document has special "markers" in the text to indicate
the structure of the taxonomy. In another case, we used the
sections/subsections in a Word document saved as text to identify the
taxonomy. Once the four taxonomies were in RDF/OWL, we used Prompt to
integrate them into an illustration of an integrated software flaw taxonomy:
http://samate.nist.gov/SSATTM_Content/all-with-plover_v017/taxframe.html
Additional conversion work in this area includes converting an XML
representation of a taxonomy using XSLT.

We have also converted a relational database representation of taxonomies to
the treeview representation created by Marcelino Alves Martins
(http://www.treeview.net) using both Java and XSLT. This effort is part of
our hosting the HL7 Experimental Registry:
http://hcxw2k1.nist.gov:8080/hlxregdoc/docindex.html
The output of the conversion could have just as easily been RDF/OWL. The
relational representation is that specified by the OASIS ebXML
Registry/Repository Technical Committee:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=regrep
The result of conversion can be seen at:
http://hcxw2k1.nist.gov:8080/hl7services/index.jsp (click on the "HL7
Vocabulary" button)

This is a brief summary of some of our conversion activities. If you would
like more details, please let me know.

thanks,

jb

Received on Tuesday, 28 February 2006 16:17:52 UTC