Re: Best way for exposing Linked Open Data. Wrapper vs scrape from Kingsley Idehen on 2013-05-28 (public-lod@w3.org from May 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 28 May 2013 07:31:05 -0400
To: public-lod@w3.org
Message-ID: <51A49579.7070005@openlinksw.com>
On 5/28/13 5:22 AM, Luca Matteis wrote:
> Thanks, Jürgen. Are you at #eswc2013? Maybe we can talk about this 
> face to face :-)
> But anyway my two points were related to (i) letting my users do the 
> work of publishing LOD or (ii) doing the work myself by aggregating 
> their data.

Luca,

For end-users and integrators the general flow would be as follows, if 
using Virtuoso [1]:

1. If the data is in a SQL RDBMS then you attach to the RDBMS via an 
ODBC or JDBC connection
2. If the data is in CSV then you can import it into Virtuoso which 
results in a SQL RDBMS table being derived from the import
3. You then use the Linked Data Views wizard to generate and publish 
5-Star Linked Data from any combination of the sources above.

With the above in place, you end up with two data access routes i.e., 
Virtuoso becomes a conduit to SQL RDBMS (local or remote via different 
SQL data access protocols) and RDF based Linked Data (local or remote 
e.g., out to the LOD Cloud):

1. Linked Data URIs and/or SPARQL
2. ODBC, JDBC, ADO.NET, OLE DB, XMLA  etc..

To get a feel for the end product, take a look at:

1. http://bit.ly/YcWb6r -- HTML5 based SQL, SPARQL, and Linked Data 
query tool (note: just click on connect since the instance is setup to 
work with a default 'vdb' user)
2. http://bit.ly/18pGTFd -- 'Employees' table query via Virtuoso's 
virtual DBMS layer (note: the query is redirected to Oracle via ODBC and 
the green links indicate primary keys which have been turned into urn: 
based URIs that resolve locally)
3. http://bit.ly/115dPeD -- Super Keys tab that bridges Linked Data 
Views with the SQL Tables (the blue links, unlink green links, showcase 
the super key effect of Linked Data URIs i.e., you can just copy and 
paste them to any browser and then explore from there onwards i.e., they 
are no longer in a SQL RDBMS silo!) .

The process is explained and demonstrated in a circa. 2009 two-part 
screencast I published at:

1. http://bit.ly/16mnJjN -- generating RDF based Linked Data views atop 
SQL data sources (Part 1)
2. http://bit.ly/171PHkt -- generating RDF based Linked Data views atop 
SQL data sources (Part 2).

Note: the default browser views have changed significantly since 2009, 
you now have faceted browsing views as description documents for the RDF 
Views generated by Virtuoso.

Links:

1. http://bit.ly/10xg51K -- RDF based Linked Data views generator guide.

Kingsley

>
> Cheers,
> Luca
>
>
> On Tue, May 28, 2013 at 11:07 AM, Jürgen Jakobitsch SWC 
> <j.jakobitsch@semantic-web.at <mailto:j.jakobitsch@semantic-web.at>> 
> wrote:
>
>     :-) experience shows that the technical aspect of your endeavor is
>     probably the simplest and you'll have a lot of time to think about it
>     until every group settles on a uri pattern and the vocabularies to be
>     used unless you go north-korean and impose such things...
>     when you have a couple of datasets the probability of one single
>     solution that fits all parties is very low.
>     such desicions depend on a lot of non-technical factors like
>     willingness
>     to move to the rdf/semweb/linkeddata world, are there current
>     workflows
>     that groups of people are using.
>
>     technically it depends on things like dataset size, use cases (is it
>     enough to simply make this data dereferenceable, is there need to make
>     the data queryable (what kinds of queries, there are certain parts
>     that
>     are quite difficult to implement when with sparql to sql, limit
>     and top
>     in certain cases))
>
>     i guess the => fastest <= (not necessarily the best) way would be to
>     create dumps (custom scripts, rdb2rdf) and put these into a
>     virtuoso or
>     a triple store of your choice in combination with tools like
>     "pubby" [2]. then use "limes" or another tool to create links to other
>     lod sources. that way the change of peoples' behaviour is not a
>     requirement for success.
>
>     wkr jürgen
>
>     [1] http://aksw.org/Projects/LIMES.html
>     [2] http://wifo5-03.informatik.uni-mannheim.de/pubby/
>
>     On Tue, 2013-05-28 at 10:18 +0200, Luca Matteis wrote:
>     > Here's my scenario: I have several different datasets. Most in MySQL
>     > databases. Some in PostrgreSQL. Others in MS Access. Many in
>     CSV. Each
>     > one of these datasets is maintained by its own group of people.
>     >
>     >
>     > Now, my end goal is to have all these datasets published as 5 stars
>     > Linked Open Data. But I am in doubt between these two solutions:
>     >
>     >
>     > 1) Give a generic wrapper tool to each of these groups of
>     people, that
>     > would basically convert their datasets to RDF, and allow them to
>     > publish this data as LOD automatically. This tool would allow
>     them to
>     > publish LOD on their own, using their own server (does such a
>     generic
>     > tool even exist? Can it even be built?).
>     >
>     >
>     > 2) Scrape these datasets, which are at times simply published on the
>     > Web as HTML paginated tables, or published as dumps on their server,
>     > for example a .CSV dump of their entire database. Then I would
>     > aggregate all these various datasets myself, and publish them as
>     > Linked Data.
>     >
>     >
>     > Pros and cons for each of these methods? Any other ideas?
>     >
>     >
>     > Thanks!
>
>     --
>     | Jürgen Jakobitsch,
>     | Software Developer
>     | Semantic Web Company GmbH
>     | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>     | A - 1070 Wien, Austria
>     | Mob +43 676 62 12 710 <tel:%2B43%20676%2062%2012%20710> | Fax
>     +43.1.402 12 35 - 22 <tel:%2B43.1.402%2012%2035%20-%2022>
>
>     COMPANY INFORMATION
>     | web       : http://www.semantic-web.at/
>     | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
>     PERSONAL INFORMATION
>     | web       : http://www.turnguard.com
>     | foaf      : http://www.turnguard.com/turnguard
>     | g+        : https://plus.google.com/111233759991616358206/posts
>     | skype     : jakobitsch-punkt
>     | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 28 May 2013 11:31:31 UTC