W3C home > Mailing lists > Public > public-lod@w3.org > May 2013

Re: Best way for exposing Linked Open Data. Wrapper vs scrape

From: Alfredo Serafini <seralf@gmail.com>
Date: Tue, 28 May 2013 11:41:40 +0200
Message-ID: <CADawF4MC2N17=yqC7BV9QfwgXR8e6_OVasxZWQd+u1DCuwA3gA@mail.gmail.com>
To: Luca Matteis <lmatteis@gmail.com>
Cc: j.jakobitsch@semantic-web.at, Linked Data community <public-lod@w3.org>
Given your scenario i'd rather go to the first one. In particular i suggest
to proceed in two steps:
1) map your db with d2rq, so you have a sparql endpoint (basically
read-only, for most, or i suggest this by the way and exposing api if you
want instead writing capabilities). D2RQ gives also a simple spubby-like
visualization
2) configure a linked data api compliant browser, such as Pubby or Elda,
and there you can offer a more html-oriented visualization/navigation


2013/5/28 Luca Matteis <lmatteis@gmail.com>

> Thanks, Jürgen. Are you at #eswc2013? Maybe we can talk about this face
> to face :-)
> But anyway my two points were related to (i) letting my users do the work
> of publishing LOD or (ii) doing the work myself by aggregating their data.
>
> Cheers,
> Luca
>
>
> On Tue, May 28, 2013 at 11:07 AM, Jürgen Jakobitsch SWC <
> j.jakobitsch@semantic-web.at> wrote:
>
>> :-) experience shows that the technical aspect of your endeavor is
>> probably the simplest and you'll have a lot of time to think about it
>> until every group settles on a uri pattern and the vocabularies to be
>> used unless you go north-korean and impose such things...
>> when you have a couple of datasets the probability of one single
>> solution that fits all parties is very low.
>> such desicions depend on a lot of non-technical factors like willingness
>> to move to the rdf/semweb/linkeddata world, are there current workflows
>> that groups of people are using.
>>
>> technically it depends on things like dataset size, use cases (is it
>> enough to simply make this data dereferenceable, is there need to make
>> the data queryable (what kinds of queries, there are certain parts that
>> are quite difficult to implement when with sparql to sql, limit and top
>> in certain cases))
>>
>> i guess the => fastest <= (not necessarily the best) way would be to
>> create dumps (custom scripts, rdb2rdf) and put these into a virtuoso or
>> a triple store of your choice in combination with tools like
>> "pubby" [2]. then use "limes" or another tool to create links to other
>> lod sources. that way the change of peoples' behaviour is not a
>> requirement for success.
>>
>> wkr jürgen
>>
>> [1] http://aksw.org/Projects/LIMES.html
>> [2] http://wifo5-03.informatik.uni-mannheim.de/pubby/
>>
>> On Tue, 2013-05-28 at 10:18 +0200, Luca Matteis wrote:
>> > Here's my scenario: I have several different datasets. Most in MySQL
>> > databases. Some in PostrgreSQL. Others in MS Access. Many in CSV. Each
>> > one of these datasets is maintained by its own group of people.
>> >
>> >
>> > Now, my end goal is to have all these datasets published as 5 stars
>> > Linked Open Data. But I am in doubt between these two solutions:
>> >
>> >
>> > 1) Give a generic wrapper tool to each of these groups of people, that
>> > would basically convert their datasets to RDF, and allow them to
>> > publish this data as LOD automatically. This tool would allow them to
>> > publish LOD on their own, using their own server (does such a generic
>> > tool even exist? Can it even be built?).
>> >
>> >
>> > 2) Scrape these datasets, which are at times simply published on the
>> > Web as HTML paginated tables, or published as dumps on their server,
>> > for example a .CSV dump of their entire database. Then I would
>> > aggregate all these various datasets myself, and publish them as
>> > Linked Data.
>> >
>> >
>> > Pros and cons for each of these methods? Any other ideas?
>> >
>> >
>> > Thanks!
>>
>> --
>> | Jürgen Jakobitsch,
>> | Software Developer
>> | Semantic Web Company GmbH
>> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>> | A - 1070 Wien, Austria
>> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>>
>> COMPANY INFORMATION
>> | web       : http://www.semantic-web.at/
>> | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
>> PERSONAL INFORMATION
>> | web       : http://www.turnguard.com
>> | foaf      : http://www.turnguard.com/turnguard
>> | g+        : https://plus.google.com/111233759991616358206/posts
>> | skype     : jakobitsch-punkt
>> | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>>
>>
>
Received on Tuesday, 28 May 2013 09:42:13 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:44 UTC