- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Fri, 14 Feb 2014 09:21:39 -0600
- To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
- Message-ID: <CAMVTWDwjzSQY0q6nJ2ZqEENA2HyZhSrL0UZzd+uCox408K-Vng@mail.gmail.com>
Jeremy, all, I'll try to find some time to do it. However, the usecase would express the general notion of "why I need to convert to RDF". Basically any RDF usecase can be applied here. danbri, do you have any particular usecases in mind. Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Fri, Feb 14, 2014 at 4:31 AM, Tandy, Jeremy < jeremy.tandy@metoffice.gov.uk> wrote: > Hi Juan - thanks for your thoughts. I note that currently we don't have > any use cases on the wiki <https://www.w3.org/2013/csvw/wiki/Use_Cases>that discuss publication of (semantically enabled) CSV from relational > database tables. We need use cases, expressed as a user-driven / > outcome-driven narrative (e.g. tell a story about what someone is trying to > achieve rather than use abstract functional requirements) with real data > examples, in order to establish our requirements for the eventual spec. > > > > Can you add something to the wiki please so that I can incorporate it into > the "Use cases and requirements" documentation? > > > > Many thanks, Jeremy > > > > *From:* Juan Sequeda [mailto:juanfederico@gmail.com] > *Sent:* 12 February 2014 16:58 > *To:* public-csv-wg@w3.org > *Subject:* Intro and Thoughts on CSV2RDF > > > > All, > > > > Quick intro (even though I did make an intro on the first call): I'm > finishing my PhD in CS at UT Austin. My research focuses on the integration > of relational databases with the semantic web. A result of my research is > Ultrawrap [1], a Relational Database to RDF (RDB2RDF) system capable of > running SPARQL queries as fast as SQL queries. Ultrawrap has been > productized and is compliant with the W3C Direct Mapping and R2RML > standards for RDB2RDF. Ultrawrap is currently being commercialized by my > startup, Capsenta [2]. Ultrawrap is being used to generate the RDF dumps of > Musicbrainz. Additionally, the data behind Constitute Project [3] comes > from hundreds of CSVs and converted to RDF using Ultrawrap. I've been > involved in the RDB2RDF space since the first workshop in 2007, XG, WG, > editor of the Direct Mapping spec and implementor of both standards. > > > > So... I believe I can bring some thoughts to the table wrt CSV to RDF. > Part of these thoughts come from conversations that I have had previously > with danbri. > > > > I saw in today's minutes that the RDB2RDF topic came up. I agree with Axel > that "CSV2RDF should be just a "dialect/small modification" of the existing > RDB2RDF spec". I actually encourage that there exists both a Direct Mapping > (completely automated mapping) and a modification of R2RML. > > > > The following issues arise: > > - How do you know if the first column is a header or not. > > - How do you know if there exists an id attribute/field which acts as a > unique identifier for the tuple (i.e primary key). > > > > Therefore, there needs to be a way to state this in a standard way. I'm > assuming this is going to go somewhere. Given this information, the Direct > Mapping standard should apply transparently (or so I believe at this > moment). > > > > Now with R2RML, I believe some changes need to be made. R2RML was made to > take advantage of SQL as much as possible; that is why you can define a > mapping on table or on a sql query. Take for example the following R2RML > mappings for Musicbrainz [4]. You can see that the tuples from "SELECT * > FROM artist WHERE artist.type = 1" are mapped to instances of > mo:SoloMusicArtist while tuples from "SELECT * FROM artist WHERE > artist.type = 2" are mapped to instances of mo:MusicGroup. I'm not sure how > to do this without a SQL engine. Therefore, should SQL engines be involved > in the CSV to RDF transformation? > > > > Another instance where R2RML relies heavily on SQL is when you want to > translate database codes into IRIs [5]. For example, if you have a code > value "eng" which should be mapped to some URI > http://example.com/engineering, which is part of a well defined > thesaurus/vocabulary. > > > > We have implemented a CSV2RDF in Ultrawrap which uses the Direct Mapping > and R2RML standards as-is. The only assumption we have at the moment is > that the first column is a header and the first attribute acts as a primary > key. > > > > Another topic I've discussed with danbri is if you have a set of csv, > which basically are the CSV dumps of all the relational tables of a > database. Therefore, implicitly there are foreign keys. There should be a > way to describe the relationships (foreign keys) between different CSVs. > > > > These are my initial thoughts. Looking forward to hearing what others have > to say. > > > > [1] http://www.sciencedirect.com/science/article/pii/S1570826813000383 > > [2] http://www.capsenta.com/ > > [3] https://www.constituteproject.org/ > > [4] > https://github.com/LinkedBrainz/MusicBrainz-R2RML/blob/master/mappings/artist.ttl > > [5] http://www.w3.org/TR/r2rml/#example-translationtable > > > Juan Sequeda > +1-575-SEQ-UEDA > www.juansequeda.com >
Received on Friday, 14 February 2014 15:22:28 UTC