Re: Tools for transforming data to RDF

As Irene said, http://esw.w3.org/topic/ConverterToRdf is the best place 
to start, but I thought I'd ramble a bit about some of the broader issues.

If the data to convert is in a file, as opposed to being delivered from 
a server with an interface that you can write to (as D2RQ and OpenLink 
do for relational data), then the first step is to parse the input, so 
tools will be built around parsers for each input format.

Any modern programming language can parse CSV easily, and most tools 
that advertise the ability to convert spreadsheets to RDF actually 
expect CSV input. (TopQuadrant's tools can read binary Excel files. Full 
disclosure: I work for them.)

When your input is XML (which can include HTML if you use TagSoup or 
Tidy to clean it up), XSLT is a popular way to create triples. This is 
the principle behind GRDDL  
(http://www.w3.org/2004/01/rdxh/spechttp://www.w3.org/2004/01/rdxh/spec). 
TopQuadrant also has a more general-purpose XML-to-RDF converter that 
takes the structure of the input document into account so that it can 
round-trip the RDF back to XML.

With plain text, something needs to identify structure within the text 
so that it can work out what the subjects, predicates, and objects are, 
and that structure depends on the needs of the application. (That 
actually applies to CSV and XML as well, but commas and tags give you 
more to go on if you understand the purpose of the input data.) Semweb 
meetups are seeing more interest from the Natural Language Processing 
community--I think the NYC semweb meetup actually has a subgroup of 
people dedicated to NLP issues--so there could be more interesting work 
coming from them in the future. Thomson Reuters Calais is the most 
well-known example that comes to mind of a tool that takes plain text as 
input and returns it with embedded RDF.

Bob


Alasdair Logan wrote:
> Hey all,
>
> I was wondering if anyone is familiar with tools to convert data into RDF triples and Linked Data. They can be for any data format i.e. XML, CSV, plain text etc. 
>
> Im doing this as part of a pilot study for my Master's project so i'm just trying get a general view of any tools used.
>
> Thanks in advance
>
> Ally
>
>   

Received on Wednesday, 10 March 2010 17:23:32 UTC