W3C home > Mailing lists > Public > public-lod@w3.org > May 2013

Re: Best way for exposing Linked Open Data. Wrapper vs scrape

From: Jürgen Jakobitsch SWC <j.jakobitsch@semantic-web.at>
Date: Tue, 28 May 2013 11:07:20 +0200
Message-ID: <1369732040.1590.15.camel@linux-1rgw.site>
To: Luca Matteis <lmatteis@gmail.com>
Cc: Linked Data community <public-lod@w3.org>
:-) experience shows that the technical aspect of your endeavor is
probably the simplest and you'll have a lot of time to think about it
until every group settles on a uri pattern and the vocabularies to be
used unless you go north-korean and impose such things...
when you have a couple of datasets the probability of one single
solution that fits all parties is very low.
such desicions depend on a lot of non-technical factors like willingness
to move to the rdf/semweb/linkeddata world, are there current workflows
that groups of people are using.

technically it depends on things like dataset size, use cases (is it
enough to simply make this data dereferenceable, is there need to make
the data queryable (what kinds of queries, there are certain parts that
are quite difficult to implement when with sparql to sql, limit and top
in certain cases))

i guess the => fastest <= (not necessarily the best) way would be to
create dumps (custom scripts, rdb2rdf) and put these into a virtuoso or
a triple store of your choice in combination with tools like
"pubby" [2]. then use "limes" or another tool to create links to other
lod sources. that way the change of peoples' behaviour is not a
requirement for success.

wkr jürgen

[1] http://aksw.org/Projects/LIMES.html
[2] http://wifo5-03.informatik.uni-mannheim.de/pubby/

On Tue, 2013-05-28 at 10:18 +0200, Luca Matteis wrote:
> Here's my scenario: I have several different datasets. Most in MySQL
> databases. Some in PostrgreSQL. Others in MS Access. Many in CSV. Each
> one of these datasets is maintained by its own group of people.
> 
> 
> Now, my end goal is to have all these datasets published as 5 stars
> Linked Open Data. But I am in doubt between these two solutions:
> 
> 
> 1) Give a generic wrapper tool to each of these groups of people, that
> would basically convert their datasets to RDF, and allow them to
> publish this data as LOD automatically. This tool would allow them to
> publish LOD on their own, using their own server (does such a generic
> tool even exist? Can it even be built?).
> 
> 
> 2) Scrape these datasets, which are at times simply published on the
> Web as HTML paginated tables, or published as dumps on their server,
> for example a .CSV dump of their entire database. Then I would
> aggregate all these various datasets myself, and publish them as
> Linked Data.
> 
> 
> Pros and cons for each of these methods? Any other ideas?
> 
> 
> Thanks!

-- 
| Jürgen Jakobitsch, 
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web       : http://www.semantic-web.at/
| foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web       : http://www.turnguard.com
| foaf      : http://www.turnguard.com/turnguard
| g+        : https://plus.google.com/111233759991616358206/posts
| skype     : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#"
Received on Tuesday, 28 May 2013 09:07:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:44 UTC