Re: Call for Editors! from Andy Seaborne on 2014-03-21 (public-csv-wg@w3.org from March 2014)

From: Andy Seaborne <andy@apache.org>
Date: Fri, 21 Mar 2014 10:51:07 +0000
To: Ivan Herman <ivan@w3.org>, Juan Sequeda <juanfederico@gmail.com>
CC: Gregg Kellogg <gregg@greggkellogg.net>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>, "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Message-ID: <532C199B.6040403@apache.org>
On 21/03/14 09:30, Ivan Herman wrote:
> Hi Juan,
>
> thanks.
>
> (Just to make it clear: I do not try to be picky here, I am just trying to articulate the issues for future reference...)
>
> - This is clear use case for the usage of RDF, ie, to convert CSV data into RDF.
 >
> - It also reflect the need to use a common vocabulary for all the converted CSV files from a particular domain, thereby binding the data to Linked Data. Meaning to use somewhere in the workflow a tool that goes beyond direct mapping.
>
> I wonder whether this should not be added to a next version of the UCR document (I let Jeremy look into this). Mainly in terms of the Direct mapping vs. more, it is a good practical example!
>
> But (and here is where I am picky:-(
>
> I understand you used the existing toolkit around R2RML and, in your case, being one of the co-editors of the Direct Mapping document, this was the absolutely natural thing to do. However... I am looking at use cases that contrast an R2RML approach to the the JSON centric approach. Because, at some point, we may have to choose, right? Would your work have been equally possible (try to forget about your personal experience) using CSV-JSON? Would have it been easier/more difficult/equal?

The mapped case gets harder when/if we cover the Grouped Data Model.

We have a base line - the logical equivalent of load the CSV files into 
an SQL database and use R2RML.

We need first establish that it is worth the WG's time and effort to 
create a different technology, however much that reuses the existing work.

I propose that our first work is focused on the one-CSV case, for 
RDF/JSON/XML because that informs the metadata for the annotations.  We 
can align the work on different formats better is we sync early and often.

	Andy

PS Were there particular parts of R2RML that took particularly long / 
cause most debate?

> Thanks
>
> Ivan
>
> On 21 Mar 2014, at 24:25 , Juan Sequeda <juanfederico@gmail.com> wrote:
>
>> Ivan, all,
>>
>> This is our use-case:
>>
>> Constitute Project [1] is a search engine for the worlds constitution. This is a project funded by Google Ideas [2]. We, Capsenta, did the mapping of the constitution data to RDF and OWL. All of the data was original Excel spreadsheets (i.e. CSV files). What we did was to import the spreadsheets into SQL Server, and then used Direct Mapping, R2RML and Ultrawrap to map the data to RDF. Why did we want to use RDF/OWL? Several reasons:
>>
>> 1) RDF (graph data model) is flexible. We don't know what is going to happen to constitutional data later. So we need to be ready for change
>> 2) We currently have 189 constitutions, each in it's own spreadsheet. We need to integrate this data.
>> 3) We created an ontology about constitutional topics. Naturally, we want to represent this in OWL.
>> 4) We want to link to other datasets, such as DBpedia
>> 5) RDF is becoming the standard to publish open data.
>>
>> These reasons are not specific to Constitute. It can apply to any csv dataset which needs search or integrated with other datasets.  More info can be found in our 2013 Semantic Web Challenge submission [3]. We won 2nd prize :)
>>
>> Constitute is having a lot of impact. We know for a fact that constitutional drafters of Tunsia, Egypt and now Mongolia have been using Constitute.
>>
>> Btw, interesting fact: On average, 5 constitutions are written from scratch every year. A constitution last on average for 20 years. People who write constitutions have never done that before and will never do that again; that is why they want to search through existing constitutions.
>>
>> [1] https://www.constituteproject.org/#/
>> [2] https://www.google.com/ideas/projects/constitute/
>> [3] http://challenge.semanticweb.org/2013/submissions/swc2013_submission_12.pdf
>>
>>
>> Juan Sequeda
>> +1-575-SEQ-UEDA
>> www.juansequeda.com
>>
>>
>> On Thu, Mar 20, 2014 at 12:53 PM, Ivan Herman <ivan@w3.org> wrote:
>> Sorry if I sound like a broken record, but I would really like to see and understand the CSV->RDF use cases, also in terms of the people who are likely to use that. Learning CSV-LD or R2RML-CSV requires a learning curve. The question is which of the two is steeper for the envisaged user base.
>>
>> (I do not have anything against any of the two, but we may have to make a choice at some point if we go down that route...)
>>
>> Ivan
>>
>> On 20 Mar 2014, at 18:47 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
>>
>>> On Mar 20, 2014, at 10:39 AM, Juan Sequeda <juanfederico@gmail.com> wrote:
>>>
>>>> If there is going to be a CSV to RDF mapping, shouldn't it be relatively close (if not almost equal to) R2RML. I foresee users doing RDB2RDF mappings with R2RML and having a few (or many) CSV files that they would like to map to RDF too. They would want to continue using the same tool.
>>>>
>>>> What we do is import the CSVs to a RDB, and then use R2RML. So as a user who needs to transform to RDF, I would want to have something almost equivalent to R2RML.
>>>
>>> This certainly is a valid use case. I was considering what the impact on developers using these tools might be. If there is a single tool (and spec) which handles the relevant use cases, then it might simplify the life of developers. Nothing against R2RML, and if that's the chain a developer's working with, the same logic would indicate that having to use something like CSV-LD would be a burden.
>>>
>>> Gregg
>>>
>>>> Juan Sequeda
>>>> +1-575-SEQ-UEDA
>>>> www.juansequeda.com
Received on Friday, 21 March 2014 10:51:37 UTC