RE: Updates to use case #21: biodiversity

Hi Tim - I've amended the use case to include the idea of adding default property value pairs to sparse data. Rather than add a new requirement, I merged it into the http://w3c.github.io/csvw/use-cases-and-requirements/#R-SpecificationOfPropertyValuePairForEachRow requirement.

Regarding the ability to declare NULL fields, we already have requirement http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition to cover this. I've not included it in the biodiversity use case because there's nothing to actually hang it on there :)

The same applies regarding the ability to document multiple files, and their relationships.

Hope that's OK.

Jeremy

From: Tim Robertson [GBIF] [mailto:trobertson@gbif.org]
Sent: 26 May 2014 08:33
To: Tandy, Jeremy
Cc: public-csv-wg@w3.org
Subject: Re: Updates to use case #21: biodiversity

Thank you very much Jeremy - great improvements which are accurate.

A few brief comments which might be worth adding:

Although not present in this example, the DwC-A supports:

a) The ability to define a default value for declared fields should none be found in sparsely populated tables
  > no requirement exists for this?
b) The ability to document multiple files, and their relationships
  > requirement exists already with foreign key

If I were to rework the DwC-A standard today, I would include the explicit declaration of the NULL value.  In our code [1] we have to handle this with guesswork which is pretty fragile.  We make use of the Hadoop import tool Sqoop which allows this feature and it is very useful:
  http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_null_string_handling

I believe there is a use case for explicitly being able to declare in a file produced by (e.g.) MySQL that \N represents NULL globally without resorting to guess work.  I'm not sure I've seen a requirement for this though.  Perhaps this can be added under this use case for consideration?

Thanks again,
Tim

[1] https://github.com/gbif/dwca-reader/blob/master/src/main/java/org/gbif/dwc/record/RecordImpl.java#L17








On 26 May 2014, at 02:07, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk<mailto:jeremy.tandy@metoffice.gov.uk>> wrote:


All - I've updated the biodiversity use case (originally contributed by Tim Robertson of GBIF) so that it is now action-oriented and user-centred ... our protagonist is a citizen scientist who wants to build a web app to show biodiversity information about the Sierra Nevada national park, Spain (because that's what the dataset I picked up as an example from GBIF refers to!).

The use case is renamed "PublicationOfBiodiversityInformation" and is available at <http://w3c.github.io/csvw/use-cases-and-requirements/#UC-PublicationOfBiodiversityInformation>.

Also note the new Requirement <http://w3c.github.io/csvw/use-cases-and-requirements/#R-SpecificationOfPropertyValuePairForEachRow>.

Comments welcome - although I am particularly interested in Tim's perspective as to whether this heavily edited use case still makes the key points he wanted. I think it does - but I'd like confirmation.

Jeremy

Received on Monday, 26 May 2014 14:06:15 UTC