- From: Tim Robertson [GBIF] <trobertson@gbif.org>
- Date: Wed, 7 May 2014 18:17:27 +0200
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: public-csv-wg@w3.org, "rufus.pollock@okfn.org" <rufus.pollock@okfn.org>
- Message-Id: <09BA1CF6-5681-4421-9D73-2642A4020B85@gbif.org>
Thanks Jeni > Thanks Tim, > > Yes, I saw the use case that you put forward, I just didn’t see anything explicitly about default values or fixed values in it. What I’ve done is add an issue here: > > https://github.com/w3c/csvw/issues/6 > > which references out to your use case, and a comment to highlight the need to address default and fixed values. > > It would be great if you could extend your use case to provide: > > 1. an actual DwC-A format file There was one attached to the use case I submitted but maybe it got lost on the list with comments in the meta.xml (analogous to your JSON annotation spec) but the example is tiny just to highlight the concepts - the ideas in the meta.xml are the important bit here. > 2. some more detailed description of what is done with the files (eg loading into Hadoop) that highlights the requirements for default and fixed values > Would that be possible? You could add notes on to the GitHub issue as a way of supplying that, if that’s helpful. I’ll comment on the issue Also - the use case was a proposal with some ideas we use in practice but I think it premature to consider them accepted requirements. Thanks, Tim
> > ------------------------------------------------------ > From: Tim Robertson [GBIF] trobertson@gbif.org > Reply: Tim Robertson [GBIF] trobertson@gbif.org > Date: 7 May 2014 at 16:58:30 > To: Jeni Tennison jeni@jenitennison.com > Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org > Subject: Re: Metadata document v0.0.1 > >> Hi Jeni >> >> I submitted a use case today (Darwin Core Archives) and I hope it’s under consideration >> by Eric and the use case team as I think it had a couple of new requirements / sub-requirements. >> >> I’ll justify them a bit here inline here just for your context. >> >>> IIRC, we don’t have, in our use cases document, use cases for a couple of these requirements. >> It would be great if you could put some together. Specifically examples where it’s useful >> to have: >>> * default values for a column >> >> If a CSV is produced from a DB with a NULLABLE field, but mapped using to a well defined vocabulary >> (e.g. R-AssociationOfCodeValuesWithExternalDefinitions [1]) the external definitions >> might have an explicit UNKNOWN category, but the CSV will have it missing being NULL. >> It is sometimes useful to express that explicitly when mapping the source data. >> >>> * fixed-value fields applicable to all rows >> >> Imagine integrating 1000s of heterogenous CSVs some of which are missing important >> values. >> Suppose there are “latitude” and “longitude” columns on all, but your creating maps >> by category and some files are missing the category field, but when annotating the CSV >> you know it comes from a dataset where they are all “category X”. It’s nice to capture that. >> >> Hint: http://api.gbif.org/v0.9/map/index.html (bottom left toggle will popup and >> you can select “unknown evidence” - we’re slowly getting through annotating all those >> unknown categories… but there are still millions of records unknown) >> >>> (This is just so that we have it all justified, and are able to cull the use cases document >> for examples; I’m not at all questioning whether these are real requirements.) >>> >>> We do have a requirement to support missing values: R-MissingValueDefinition [1], >> so we will get around to making sure that’s included in the metadata definition :) >>> >>> What format are you converting your CSVs to? >> >> We are integrating 1000’s of databases by mapping them to a standard called Darwin Core >> Archive [2] and then loading them all into central indexes (Hadoop based). >> >> Thanks for taking the time to consider the suggestions. >> >> Cheers, >> Tim >> >> >> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-AssociationOfCodeValuesWithExternalDefinitions >> [2] Darwin Core Archive based on http://rs.tdwg.org/dwc/terms/guides/text/index.htm >> >>> >>> Thanks, >>> >>> Jeni >>> >>> [1] http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition >>> >>> ------------------------------------------------------ >>> From: Tim Robertson [GBIF] trobertson@gbif.org >>> Reply: Tim Robertson [GBIF] trobertson@gbif.org >>> Date: 7 May 2014 at 15:23:03 >>> To: Jeni Tennison jeni@jenitennison.com >>> Cc: public-csv-wg@w3.org public-csv-wg@w3.org, rufus.pollock@okfn.org rufus.pollock@okfn.org >>> Subject: Re: Metadata document v0.0.1 >>> >>>> Hi Jeni, >>>> >>>> Looks nice - quick observations for your consideration / dismissal: >>>> >>>> - should it support the ability to offer a default value for fields where the value is >> missing? >>>>> if so, this impacts the constraints section >>>> >>>> - should it be possible to declare a NULL value to support e.g. \N from MySQL dumps. >>>>> if so, impacts constraints as well >>>> >>>> - should it support the ability to offer a fixed field applicable to all rows, but not >> present >>>> in the CSV to allow data enrichment? >>>>> probably outside current scope, but we find very useful to annotate data in the use >> case >>>> I proposed today >>>> >>>> HTH, >>>> Tim >>>> >>>> >>>> On 07 May 2014, at 14:00, Jeni Tennison wrote: >>>> >>>>> A highly draft version of the spec for metadata vocabulary is available at >>>>> >>>>> http://w3c.github.io/csvw/metadata/ >>>>> >>>>> There’s lots of work still to do. Rufus and I are concentrating on the single CSV file, >>>> single metadata file case to start with, and trying to create a JSON format that can >> be >>>> interpreted as JSON-LD into RDF (rather than defining an RDF vocabulary which is then >>>> described as JSON). >>>>> >>>>> Jeni >>>>> -- >>>>> Jeni Tennison >>>>> http://www.jenitennison.com/ >>>> >>>> >>>> >>>> >>> >>> -- >>> Jeni Tennison >>> http://www.jenitennison.com/ >> >> ---------------------------------------------------------------------------------------- >> Tim Robertson - GBIF Head of Informatics - trobertson@gbif.org >> Global Biodiversity Information Facility http://www.gbif.org/ >> GBIF Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark >> Tel: +45 3532 1487 Mob: +45 2826 1487 Fax: +45 2875 1480 >> ---------------------------------------------------------------------------------------- >> >> > > -- > Jeni Tennison > http://www.jenitennison.com/
Attachments
Received on Wednesday, 7 May 2014 16:17:54 UTC