Re: Metadata document v0.0.1

Hi Jeni

I submitted a use case today (Darwin Core Archives) and I hope it’s under consideration by Eric and the use case team as I think it had a couple of new requirements / sub-requirements.

I’ll justify them a bit here inline here just for your context.

> IIRC, we don’t have, in our use cases document, use cases for a couple of these requirements. It would be great if you could put some together. Specifically examples where it’s useful to have:
>   * default values for a column

If a CSV is produced from a DB with a NULLABLE field, but mapped using to a well defined vocabulary (e.g. R-AssociationOfCodeValuesWithExternalDefinitions  [1]) the external definitions might have an explicit UNKNOWN category, but the CSV will have it missing being NULL.  It is sometimes useful to express that explicitly when mapping the source data. 

>   * fixed-value fields applicable to all rows

Imagine integrating 1000s of heterogenous CSVs some of which are missing important values.  
Suppose there are “latitude” and “longitude” columns on all, but your creating maps by category and some files are missing the category field, but when annotating the CSV you know it comes from a dataset where they are all “category X”.  It’s nice to capture that.

Hint: http://api.gbif.org/v0.9/map/index.html (bottom left toggle will popup and you can select “unknown evidence” - we’re slowly getting through annotating all those unknown categories… but there are still millions of records unknown)

> (This is just so that we have it all justified, and are able to cull the use cases document for examples; I’m not at all questioning whether these are real requirements.)
> 
> We do have a requirement to support missing values: R-MissingValueDefinition [1], so we will get around to making sure that’s included in the metadata definition :)
> 
> What format are you converting your CSVs to?

We are integrating 1000’s of databases by mapping them to a standard called Darwin Core Archive [2] and then loading them all into central indexes (Hadoop based). 

Thanks for taking the time to consider the suggestions.

Cheers,
Tim


[1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-AssociationOfCodeValuesWithExternalDefinitions
[2] Darwin Core Archive based on http://rs.tdwg.org/dwc/terms/guides/text/index.htm

> 
> Thanks,
> 
> Jeni
> 
> [1] http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition
> 
> ------------------------------------------------------
> From: Tim Robertson [GBIF] trobertson@gbif.org
> Reply: Tim Robertson [GBIF] trobertson@gbif.org
> Date: 7 May 2014 at 15:23:03
> To: Jeni Tennison jeni@jenitennison.com
> Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org
> Subject:  Re: Metadata document v0.0.1
> 
>> Hi Jeni,
>> 
>> Looks nice - quick observations for your consideration / dismissal:
>> 
>> - should it support the ability to offer a default value for fields where the value is missing?  
>>> if so, this impacts the constraints section
>> 
>> - should it be possible to declare a NULL value to support e.g. \N from MySQL dumps.
>>> if so, impacts constraints as well
>> 
>> - should it support the ability to offer a fixed field applicable to all rows, but not present  
>> in the CSV to allow data enrichment?
>>> probably outside current scope, but we find very useful to annotate data in the use case  
>> I proposed today
>> 
>> HTH,
>> Tim
>> 
>> 
>> On 07 May 2014, at 14:00, Jeni Tennison wrote:
>> 
>>> A highly draft version of the spec for metadata vocabulary is available at
>>> 
>>> http://w3c.github.io/csvw/metadata/
>>> 
>>> There’s lots of work still to do. Rufus and I are concentrating on the single CSV file,  
>> single metadata file case to start with, and trying to create a JSON format that can be  
>> interpreted as JSON-LD into RDF (rather than defining an RDF vocabulary which is then  
>> described as JSON).
>>> 
>>> Jeni
>>> --
>>> Jeni Tennison
>>> http://www.jenitennison.com/
>> 
>> 
>> 
>> 
> 
> --  
> Jeni Tennison
> http://www.jenitennison.com/

----------------------------------------------------------------------------------------
Tim Robertson - GBIF Head of Informatics - trobertson@gbif.org
Global Biodiversity Information Facility http://www.gbif.org/
GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
Tel: +45 3532 1487  Mob: +45 2826 1487  Fax: +45 2875 1480
----------------------------------------------------------------------------------------

Received on Wednesday, 7 May 2014 15:57:44 UTC