Re: Standard Path in "Model for Tabular Data and Metadata on the Web"

Interesting discussion.

I can see two patterns of use.

1. The DSPL / datapackage.json pattern is for a group of CSV files to each be explicitly described within a metadata file, including the relationships between the different CSV files (eg to link up code list values in one CSV file with their definitions in another CSV file). In this case, the metadata file has to point explicitly to the different CSV files because the different CSV files have different schemas. I think this makes sense in a package, because by definition you know what files are present in the package.

2. I have a directory of spending data and all the CSV files in that directory follow exactly the same schema. I want to just be able to add new files to that directory and have them all be covered by the existing metadata document. Note that I couldn’t take the metadata file and know all the CSV files that were covered by the metadata document, but I could (at least potentially) tell whether a given CSV file, whose URL I knew, was covered by it.

It’s certainly the case that we need to cater for multiple CSV files having exactly the same table schema, with low overhead. A middle ground of using URL patterns to point from the metadata file to CSV files seems like the right approach to me, but we should also support a syntax where the metadata file can indicate that multiple named CSV files share the same table schema. A packager, for example, might turn a URL pattern into a list of files.

Jeni
--  
Jeni Tennison
http://www.jenitennison.com/

Received on Tuesday, 8 April 2014 19:54:55 UTC