Re: Metadata document v0.0.1

Thanks Tim,

Yes, I saw the use case that you put forward, I just didn’t see anything explicitly about default values or fixed values in it. What I’ve done is add an issue here:

  https://github.com/w3c/csvw/issues/6

which references out to your use case, and a comment to highlight the need to address default and fixed values.

It would be great if you could extend your use case to provide:

  1. an actual DwC-A format file
  2. some more detailed description of what is done with the files (eg loading into Hadoop) that highlights the requirements for default and fixed values

Would that be possible? You could add notes on to the GitHub issue as a way of supplying that, if that’s helpful.

Thanks,

Jeni

------------------------------------------------------
From: Tim Robertson [GBIF] trobertson@gbif.org
Reply: Tim Robertson [GBIF] trobertson@gbif.org
Date: 7 May 2014 at 16:58:30
To: Jeni Tennison jeni@jenitennison.com
Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org
Subject:  Re: Metadata document v0.0.1

> Hi Jeni
>  
> I submitted a use case today (Darwin Core Archives) and I hope it’s under consideration  
> by Eric and the use case team as I think it had a couple of new requirements / sub-requirements.  
>  
> I’ll justify them a bit here inline here just for your context.
>  
> > IIRC, we don’t have, in our use cases document, use cases for a couple of these requirements.  
> It would be great if you could put some together. Specifically examples where it’s useful  
> to have:
> > * default values for a column
>  
> If a CSV is produced from a DB with a NULLABLE field, but mapped using to a well defined vocabulary  
> (e.g. R-AssociationOfCodeValuesWithExternalDefinitions [1]) the external definitions  
> might have an explicit UNKNOWN category, but the CSV will have it missing being NULL.  
> It is sometimes useful to express that explicitly when mapping the source data.
>  
> > * fixed-value fields applicable to all rows
>  
> Imagine integrating 1000s of heterogenous CSVs some of which are missing important  
> values.
> Suppose there are “latitude” and “longitude” columns on all, but your creating maps  
> by category and some files are missing the category field, but when annotating the CSV  
> you know it comes from a dataset where they are all “category X”. It’s nice to capture that.  
>  
> Hint: http://api.gbif.org/v0.9/map/index.html (bottom left toggle will popup and  
> you can select “unknown evidence” - we’re slowly getting through annotating all those  
> unknown categories… but there are still millions of records unknown)
>  
> > (This is just so that we have it all justified, and are able to cull the use cases document  
> for examples; I’m not at all questioning whether these are real requirements.)
> >
> > We do have a requirement to support missing values: R-MissingValueDefinition [1],  
> so we will get around to making sure that’s included in the metadata definition :)
> >
> > What format are you converting your CSVs to?
>  
> We are integrating 1000’s of databases by mapping them to a standard called Darwin Core  
> Archive [2] and then loading them all into central indexes (Hadoop based).
>  
> Thanks for taking the time to consider the suggestions.
>  
> Cheers,
> Tim
>  
>  
> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-AssociationOfCodeValuesWithExternalDefinitions  
> [2] Darwin Core Archive based on http://rs.tdwg.org/dwc/terms/guides/text/index.htm  
>  
> >
> > Thanks,
> >
> > Jeni
> >
> > [1] http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition  
> >
> > ------------------------------------------------------
> > From: Tim Robertson [GBIF] trobertson@gbif.org
> > Reply: Tim Robertson [GBIF] trobertson@gbif.org
> > Date: 7 May 2014 at 15:23:03
> > To: Jeni Tennison jeni@jenitennison.com
> > Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org  
> > Subject: Re: Metadata document v0.0.1
> >
> >> Hi Jeni,
> >>
> >> Looks nice - quick observations for your consideration / dismissal:
> >>
> >> - should it support the ability to offer a default value for fields where the value is  
> missing?
> >>> if so, this impacts the constraints section
> >>
> >> - should it be possible to declare a NULL value to support e.g. \N from MySQL dumps.
> >>> if so, impacts constraints as well
> >>
> >> - should it support the ability to offer a fixed field applicable to all rows, but not  
> present
> >> in the CSV to allow data enrichment?
> >>> probably outside current scope, but we find very useful to annotate data in the use  
> case
> >> I proposed today
> >>
> >> HTH,
> >> Tim
> >>
> >>
> >> On 07 May 2014, at 14:00, Jeni Tennison wrote:
> >>
> >>> A highly draft version of the spec for metadata vocabulary is available at
> >>>
> >>> http://w3c.github.io/csvw/metadata/
> >>>
> >>> There’s lots of work still to do. Rufus and I are concentrating on the single CSV file,  
> >> single metadata file case to start with, and trying to create a JSON format that can  
> be
> >> interpreted as JSON-LD into RDF (rather than defining an RDF vocabulary which is then  
> >> described as JSON).
> >>>
> >>> Jeni
> >>> --
> >>> Jeni Tennison
> >>> http://www.jenitennison.com/
> >>
> >>
> >>
> >>
> >
> > --
> > Jeni Tennison
> > http://www.jenitennison.com/
>  
> ----------------------------------------------------------------------------------------  
> Tim Robertson - GBIF Head of Informatics - trobertson@gbif.org
> Global Biodiversity Information Facility http://www.gbif.org/
> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
> Tel: +45 3532 1487 Mob: +45 2826 1487 Fax: +45 2875 1480
> ----------------------------------------------------------------------------------------  
>  
>  

--  
Jeni Tennison
http://www.jenitennison.com/

Received on Wednesday, 7 May 2014 16:09:52 UTC