Re: Metadata document v0.0.1

Thanks Jeni 

> Thanks Tim,
> 
> Yes, I saw the use case that you put forward, I just didn’t see anything explicitly about default values or fixed values in it. What I’ve done is add an issue here:
> 
>   https://github.com/w3c/csvw/issues/6
> 
> which references out to your use case, and a comment to highlight the need to address default and fixed values.
> 
> It would be great if you could extend your use case to provide:
> 
>   1. an actual DwC-A format file

There was one attached to the use case I submitted but maybe it got lost on the list with comments in the meta.xml (analogous to your JSON annotation spec) but the example is tiny just to highlight the concepts - the ideas in the meta.xml are the important bit here.

>   2. some more detailed description of what is done with the files (eg loading into Hadoop) that highlights the requirements for default and fixed values
> Would that be possible? You could add notes on to the GitHub issue as a way of supplying that, if that’s helpful.

I’ll comment on the issue

Also - the use case was a proposal with some ideas we use in practice but I think it premature to consider them accepted requirements.

Thanks,
Tim
> 
> ------------------------------------------------------
> From: Tim Robertson [GBIF] trobertson@gbif.org
> Reply: Tim Robertson [GBIF] trobertson@gbif.org
> Date: 7 May 2014 at 16:58:30
> To: Jeni Tennison jeni@jenitennison.com
> Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org
> Subject:  Re: Metadata document v0.0.1
> 
>> Hi Jeni
>> 
>> I submitted a use case today (Darwin Core Archives) and I hope it’s under consideration  
>> by Eric and the use case team as I think it had a couple of new requirements / sub-requirements.  
>> 
>> I’ll justify them a bit here inline here just for your context.
>> 
>>> IIRC, we don’t have, in our use cases document, use cases for a couple of these requirements.  
>> It would be great if you could put some together. Specifically examples where it’s useful  
>> to have:
>>> * default values for a column
>> 
>> If a CSV is produced from a DB with a NULLABLE field, but mapped using to a well defined vocabulary  
>> (e.g. R-AssociationOfCodeValuesWithExternalDefinitions [1]) the external definitions  
>> might have an explicit UNKNOWN category, but the CSV will have it missing being NULL.  
>> It is sometimes useful to express that explicitly when mapping the source data.
>> 
>>> * fixed-value fields applicable to all rows
>> 
>> Imagine integrating 1000s of heterogenous CSVs some of which are missing important  
>> values.
>> Suppose there are “latitude” and “longitude” columns on all, but your creating maps  
>> by category and some files are missing the category field, but when annotating the CSV  
>> you know it comes from a dataset where they are all “category X”. It’s nice to capture that.  
>> 
>> Hint: http://api.gbif.org/v0.9/map/index.html (bottom left toggle will popup and  
>> you can select “unknown evidence” - we’re slowly getting through annotating all those  
>> unknown categories… but there are still millions of records unknown)
>> 
>>> (This is just so that we have it all justified, and are able to cull the use cases document  
>> for examples; I’m not at all questioning whether these are real requirements.)
>>> 
>>> We do have a requirement to support missing values: R-MissingValueDefinition [1],  
>> so we will get around to making sure that’s included in the metadata definition :)
>>> 
>>> What format are you converting your CSVs to?
>> 
>> We are integrating 1000’s of databases by mapping them to a standard called Darwin Core  
>> Archive [2] and then loading them all into central indexes (Hadoop based).
>> 
>> Thanks for taking the time to consider the suggestions.
>> 
>> Cheers,
>> Tim
>> 
>> 
>> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-AssociationOfCodeValuesWithExternalDefinitions  
>> [2] Darwin Core Archive based on http://rs.tdwg.org/dwc/terms/guides/text/index.htm  
>> 
>>> 
>>> Thanks,
>>> 
>>> Jeni
>>> 
>>> [1] http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition  
>>> 
>>> ------------------------------------------------------
>>> From: Tim Robertson [GBIF] trobertson@gbif.org
>>> Reply: Tim Robertson [GBIF] trobertson@gbif.org
>>> Date: 7 May 2014 at 15:23:03
>>> To: Jeni Tennison jeni@jenitennison.com
>>> Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org  
>>> Subject: Re: Metadata document v0.0.1
>>> 
>>>> Hi Jeni,
>>>> 
>>>> Looks nice - quick observations for your consideration / dismissal:
>>>> 
>>>> - should it support the ability to offer a default value for fields where the value is  
>> missing?
>>>>> if so, this impacts the constraints section
>>>> 
>>>> - should it be possible to declare a NULL value to support e.g. \N from MySQL dumps.
>>>>> if so, impacts constraints as well
>>>> 
>>>> - should it support the ability to offer a fixed field applicable to all rows, but not  
>> present
>>>> in the CSV to allow data enrichment?
>>>>> probably outside current scope, but we find very useful to annotate data in the use  
>> case
>>>> I proposed today
>>>> 
>>>> HTH,
>>>> Tim
>>>> 
>>>> 
>>>> On 07 May 2014, at 14:00, Jeni Tennison wrote:
>>>> 
>>>>> A highly draft version of the spec for metadata vocabulary is available at
>>>>> 
>>>>> http://w3c.github.io/csvw/metadata/
>>>>> 
>>>>> There’s lots of work still to do. Rufus and I are concentrating on the single CSV file,  
>>>> single metadata file case to start with, and trying to create a JSON format that can  
>> be
>>>> interpreted as JSON-LD into RDF (rather than defining an RDF vocabulary which is then  
>>>> described as JSON).
>>>>> 
>>>>> Jeni
>>>>> --
>>>>> Jeni Tennison
>>>>> http://www.jenitennison.com/
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Jeni Tennison
>>> http://www.jenitennison.com/
>> 
>> ----------------------------------------------------------------------------------------  
>> Tim Robertson - GBIF Head of Informatics - trobertson@gbif.org
>> Global Biodiversity Information Facility http://www.gbif.org/
>> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
>> Tel: +45 3532 1487 Mob: +45 2826 1487 Fax: +45 2875 1480
>> ----------------------------------------------------------------------------------------  
>> 
>> 
> 
> --  
> Jeni Tennison
> http://www.jenitennison.com/

Received on Wednesday, 7 May 2014 16:17:54 UTC