Re: Uploaded CSV of Chinook sample database

On 18 October 2014 09:20, Stasinos Konstantopoulos
<konstant@iit.demokritos.gr> wrote:
> Dan, all,
>
> there's also dbpedia, which comes as a set of 685 CSVs in addition to
> RDF [1]. So that gives us both the CSV and the RDF that is intended as
> equivalent. I will describe it in more detail, if you find this use case interesting.

Definitely interesting - although I would say ambitious too. I don't
think the current tool I've been using (RMLProcessor) is designed to
handle datasets of that scale. My suggestion would be that we start by
getting our story straight at the smaller end of the scale:

1. For a single CSV file, eg. the events example, how do we cite the
(R2)RML mapping file in its metadata.json?
(i.e. see scenarios//events/attempts/attempt-1 in github)

2. How does that extend to the case of multiple CSVs (i.e.
   scenarios/chinook/)? e.g. Chinook: Album.csv Customer.csv Genre.csv
InvoiceLine.csv Playlist.csv Track.csv Artist.csv Employee.csv
Invoice.csv MediaType.csv PlaylistTrack.csv

3. How does that extend to the case of multiple mappings? e.g. you
might have a mapping into schema.org and another into CIDOC or
SKOS/FOAF/DC, or another into Wikidata triples etc. What would the
metadata.json look like there?

Having a processor for these kinds of mapping that parallelized (on
hadoop or whatever) would be interesting I'm sure for large datasets
like dbpedia. Presumably easier for a single CSV than multiple since
partitioning is easier (assuming each row's mapping to triples is
independent).

For our work the metadata.json ought to be a natural entry point, so I
hope template/mapping demos will start to use that rather than their
own configurations/parameters. I have also started looking into
getting RMLProcessor to accept other RDF formats besides Turtle, even
though R2RML is officially Turtle-only. In some cases a simple mapping
could then be fully embedded inside the JSON-LD metadata file...

Dan

> Best,
> s
>
>
> [1] http://wiki.dbpedia.org/DBpediaAsTables
>
>
>
>
> On 16 October 2014 17:37, Dan Brickley <danbri@google.com> wrote:
>> On 16 October 2014 12:49, Anastasia Dimou <anastasia.dimou@ugent.be> wrote:
>>>
>>> On 10/15/2014 03:44 PM, Dan Brickley wrote:
>>>>
>>>> http://chinookdatabase.codeplex.com/
>>>>
>>>> I've uploaded a shell script that converts from the Sqlite3 dump into
>>>> CSV, alongside the CSVs.
>>>>
>>>> See
>>>> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/chinook
>>>> for overview of schema / contents and links to the data files,
>>>>
>>>> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/chinook/csv
>>>>
>>>> Intent is to use this as a well known dataset for exploring multi-CSV
>>>> issues with mappings/templates.
>>>
>>>
>>> And I uploaded an RML mapping document mapping the chinook CSV files to RDF.
>>> See
>>> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/chinook/attempts/attempt-1.
>>>
>>> Remark: I changed the public RMLProcessor (development branch) to make it
>>> even simpler to run the mappings just by providing the mapping document (URL
>>> or local file) and the output file, for instance mvn exec:java
>>> -Dexec.args="https://raw.githubusercontent.com/w3c/csvw/gh-pages/examples/tests/scenarios/chinook/attempts/attempt-1/chinook.rml.ttl
>>> /path/to/local/output/file/chinook.rml.nt"
>>
>> This is super great - thanks Anastasia! We now have an example of at
>> least one way of mapping a package of multiple CSVs into a common
>> graph.
>>
>> Are there particular bundles of CSVs from other use cases we should
>> look at next?
>>
>> The events demo (single csv) is now updated to use the new simpler
>> tooling / interface.
>>
>> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/events
>>
>> >From https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/attempts/attempt-1/mapping-events.rml.ttl
>> here is the mapping.
>>
>> <#myCSV> rml:source
>> "https://raw.githubusercontent.com/w3c/csvw/gh-pages/examples/tests/scenarios/events/source/events-listing.csv";
>> rml:referenceFormulation ql:CSV .
>> <#MusicEvent>
>> rml:logicalSource <#myCSV>;
>> rr:subjectMap [ rr:termType rr:BlankNode; rr:class schema:MusicEvent; ];
>> rr:predicateObjectMap [ rr:predicate schema:name; rr:objectMap [
>> rml:reference "Name"; ] ];
>> rr:predicateObjectMap [ rr:predicate schema:startDate; rr:objectMap [
>> rml:reference "StartDate"; rr:datatype schema:Date; ] ];
>> rr:predicateObjectMap [ rr:predicate schema:location; rr:objectMap [
>> rr:parentTriplesMap <#Place> ] ];
>> rr:predicateObjectMap [ rr:predicate schema:offers; rr:objectMap [
>> rr:parentTriplesMap <#Offer> ] ] .
>> <#Place>
>> rml:logicalSource <#myCSV>;
>> rr:subjectMap [ rr:termType rr:BlankNode; rr:class schema:Place; ];
>> rr:predicateObjectMap [ rr:predicate schema:address; rr:objectMap [
>> rml:reference "location_address"; ] ];
>> rr:predicateObjectMap [ rr:predicate schema:name; rr:objectMap [
>> rml:reference "location_name"; ] ] .
>> <#Offer>
>> rml:logicalSource <#myCSV>;
>> rr:subjectMap [ rr:termType rr:BlankNode; rr:class schema:Offer; ];
>> rr:predicateObjectMap [ rr:predicate schema:url; rr:objectMap [
>> rml:reference "ticket_url"; rr:termType rr:IRI; ] ] .
>>
>>
>>
>> The next natural question is: how should this be cited from the CSV's
>> JSON-LD metadata description?
>>
>> Can anyone help make
>> https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/attempts/attempt-1/metadata.json
>> look more plausible?
>>
>> Dan
>>

Received on Saturday, 18 October 2014 09:20:59 UTC