W3C home > Mailing lists > Public > public-lod@w3.org > May 2013

Re: Best way for exposing Linked Open Data. Wrapper vs scrape

From: Richard Light <richard@light.demon.co.uk>
Date: Tue, 28 May 2013 10:37:07 +0100
Message-ID: <51A47AC3.5030108@light.demon.co.uk>
To: public-lod@w3.org
Luca,

If there is a community of interest amongst your users, i.e. a shared 
domain, then someone will need to do the work of expressing the concepts 
and structures of that domain in Linked Data form.  I suspect that is a 
job which will fall to you, whatever technique you decide on to publish 
the data.  Without such a shared framework/ontology there will be little 
merit in bringing all this data together.

Richard

On 28/05/2013 10:22, Luca Matteis wrote:
> Thanks, Jürgen. Are you at #eswc2013? Maybe we can talk about this 
> face to face :-)
> But anyway my two points were related to (i) letting my users do the 
> work of publishing LOD or (ii) doing the work myself by aggregating 
> their data.
>
> Cheers,
> Luca
>
>
> On Tue, May 28, 2013 at 11:07 AM, Jürgen Jakobitsch SWC 
> <j.jakobitsch@semantic-web.at <mailto:j.jakobitsch@semantic-web.at>> 
> wrote:
>
>     :-) experience shows that the technical aspect of your endeavor is
>     probably the simplest and you'll have a lot of time to think about it
>     until every group settles on a uri pattern and the vocabularies to be
>     used unless you go north-korean and impose such things...
>     when you have a couple of datasets the probability of one single
>     solution that fits all parties is very low.
>     such desicions depend on a lot of non-technical factors like
>     willingness
>     to move to the rdf/semweb/linkeddata world, are there current
>     workflows
>     that groups of people are using.
>
>     technically it depends on things like dataset size, use cases (is it
>     enough to simply make this data dereferenceable, is there need to make
>     the data queryable (what kinds of queries, there are certain parts
>     that
>     are quite difficult to implement when with sparql to sql, limit
>     and top
>     in certain cases))
>
>     i guess the => fastest <= (not necessarily the best) way would be to
>     create dumps (custom scripts, rdb2rdf) and put these into a
>     virtuoso or
>     a triple store of your choice in combination with tools like
>     "pubby" [2]. then use "limes" or another tool to create links to other
>     lod sources. that way the change of peoples' behaviour is not a
>     requirement for success.
>
>     wkr jürgen
>
>     [1] http://aksw.org/Projects/LIMES.html
>     [2] http://wifo5-03.informatik.uni-mannheim.de/pubby/
>
>     On Tue, 2013-05-28 at 10:18 +0200, Luca Matteis wrote:
>     > Here's my scenario: I have several different datasets. Most in MySQL
>     > databases. Some in PostrgreSQL. Others in MS Access. Many in
>     CSV. Each
>     > one of these datasets is maintained by its own group of people.
>     >
>     >
>     > Now, my end goal is to have all these datasets published as 5 stars
>     > Linked Open Data. But I am in doubt between these two solutions:
>     >
>     >
>     > 1) Give a generic wrapper tool to each of these groups of
>     people, that
>     > would basically convert their datasets to RDF, and allow them to
>     > publish this data as LOD automatically. This tool would allow
>     them to
>     > publish LOD on their own, using their own server (does such a
>     generic
>     > tool even exist? Can it even be built?).
>     >
>     >
>     > 2) Scrape these datasets, which are at times simply published on the
>     > Web as HTML paginated tables, or published as dumps on their server,
>     > for example a .CSV dump of their entire database. Then I would
>     > aggregate all these various datasets myself, and publish them as
>     > Linked Data.
>     >
>     >
>     > Pros and cons for each of these methods? Any other ideas?
>     >
>     >
>     > Thanks!
>
>     --
>     | Jürgen Jakobitsch,
>     | Software Developer
>     | Semantic Web Company GmbH
>     | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>     | A - 1070 Wien, Austria
>     | Mob +43 676 62 12 710 <tel:%2B43%20676%2062%2012%20710> | Fax
>     +43.1.402 12 35 - 22 <tel:%2B43.1.402%2012%2035%20-%2022>
>
>     COMPANY INFORMATION
>     | web       : http://www.semantic-web.at/
>     | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
>     PERSONAL INFORMATION
>     | web       : http://www.turnguard.com
>     | foaf      : http://www.turnguard.com/turnguard
>     | g+        : https://plus.google.com/111233759991616358206/posts
>     | skype     : jakobitsch-punkt
>     | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>
>

-- 
*Richard Light*
Received on Tuesday, 28 May 2013 09:37:35 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:44 UTC