Re: questions on Generating RDF from Tabular Data on the Web

> On Jan 28, 2015, at 4:13 PM, dave.lewis@cs.tcd.ie wrote:
> 
> Hi,
> I have a use case in the publishing and processing of language resources where we are interested in converting the meta-data of a CSV resource into RDF without that actual data.
> 
> This would allow us to use SPARQL to search for tables on the meta-data attributes without loading all the data into an RDF store. As the tables can be large, the latter is an overhead, and unnecessary to our use case as there are well developed tools to process or filter tabular language data once the tables required are located.
> 
> My reading of the mapping algorithm at:
> http://w3c.github.io/csvw/csv2rdf/#map-annotated-tab-table
> 
> is that it doesn't permit the mapping of data to be omitted, i.e. step 10 specified 'SHALL'

Good to have this datapoint. We're actually tracking that issue here: https://github.com/w3c/csvw/issues/64. There are also cases where it might be useful to have a column description where the column doesn't actually exist in the CSV (say, for constructing values from multiple columns).

> Had you considered such a meta-data only mapping use cases?
> 
> Further, such a mapping would also raise the possibility of having a RDF equivalent of the .csvm file accompanying a csv file. This might be valuable in use cases where RDF/DCAT crawlers are already in use, which could then pick up the meta-data without themselves having to implement the JSON-RDF mapping. Am I correct that currently the json .csvm format is the only valid format for meta-data.

Yes, but we dropped using the ".csvm" suffix in favor of simple ".json", so this might make it hard for non-aware crawlers to find such data without performing introspection into the metadata file. However, a crawler encountering a ".csv" file might find a describedby link relation to the metadata file and figure it out that way (see http://w3c.github.io/csvw/syntax/#link-header).

> I haven't been tracking the WG very closely, so my apologies if this has already been discussed.

All input is appreciated!

Gregg

> Kind Regards,
> Dave
> 
> -- 
> Director - Knowledge and Data Engineering Group
> The CNGL Centre for Global Intelligent Content
> School of Computer Science and Statistics
> Trinity College Dublin
> 
> 

Received on Thursday, 29 January 2015 00:30:16 UTC