Re: genealogical data (What work is being done using RDF?) from Lars Marius Garshol on 2002-05-04 (www-rdf-interest@w3.org from May 2002)

From: Lars Marius Garshol <larsga@garshol.priv.no>
Date: 04 May 2002 14:24:49 +0200
To: <www-rdf-interest@w3.org>
Message-ID: <m3r8ksnlri.fsf@pc36.avidiaasen.online.no>

* Danny Ayers
|
| I'm rather intrigued by the genealogy - TM mapping, if you can pass
| on any of your experiences I for one would be most grateful.

I'd be happy to, but I'm not sure what kinds of things it is you would
like to know. 

An older version of the syntax used to describe the RDF->TM mapping
part is described at
<URL: http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html#N1053 >

if that's the part you are interested in. It's turned out to be a
little too simplistic (not very surprising), but I haven't yet come up
with a satisfactory way to solve that. 

The strength of this solution is that it can be event-based, so it can
easily process huge amounts of data into a topic map. I've used Jena's
RDF parser to turn all of MusicBrainz into an RDBMS topic map, for
example. 
 
* Lars Marius Garshol
|
| So my current approach is GEDCOM -> RDF, chew and mine RDF, RDF ->
| TM.  This works pretty well.
 
* Danny Ayers
|
| Not entirely relevant, but what do you use for chewing & mining?

I'm developing a topic map auto-generation toolkit that will become an
Ontopia product. The GEDCOM work was part of a Jython prototype that
was done to try out the ideas.

Basically the system uses an XML configuration file to set up a chain
of modules that process data. So there are modules that turn things
into RDF (GEDCOM, XML, CSV files, JDBC data, emails, etc), and then
there are modules that do things like simple regexp string processing,
turning literals into URIs, restructuring the RDF model etc.

There is a module for turning the RDF into topic map data, and further
modules for loading/merging/exporting topic map data. Part of what's
good about this bit is that information integration becomes dead easy
because of the merging rules of topic maps. (Well, the mechanical part
of it, anyway. Establishing identity is as hard as it ever was, of
course.)

What I want to do once the stuff has been turned into a topic map is
to go back to the unstructured RDF fields and use the topic map to
mine more information out of those. Once you have a topic map you know
the name of every thing/resource/subject in your system, and so you
can more easily spot them in strings. I've done some experiments with
this and it seems to work quite well, but more effort is still needed.
 
| The things that go on behind closed doors ;-)

Yeah. :-)  This stuff will eventually emerge from behind those doors,
though. We just need to come up with a design we're happy with first.
 
| The latest stuff is if anything a bit more lumpy thanks to the CLS
| putting thechurch-related stuff in, but for a straight genealogical
| schema, I concur, GEDCOM more or less says it all.

I like the design of the format quite a lot. It's very easy to
understand, parse, and work with, even if the data model could do with
some work.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >

Received on Saturday, 4 May 2002 08:25:04 UTC