- From: Eric Stephan <ericphb@gmail.com>
- Date: Tue, 13 May 2014 16:16:04 -0700
- To: "Tim Robertson [GBIF]" <trobertson@gbif.org>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Jeremy Tandy <jeremy.tandy@metoffice.gov.uk>, "Ceolin, D." <d.ceolin@vu.nl>, Ivan Herman <ivan@w3.org>
Tim, Thanks so much for your contribution. I've added it to the Use Case document. Let me know if anything was missed. http://w3c.github.io/csvw/use-cases-and-requirements/index.html Eric On Wed, May 7, 2014 at 4:41 AM, Tim Robertson [GBIF] <trobertson@gbif.org> wrote: > Thanks Eric > Please can you consider a use case as described below? > Attached is an example and a separate meta.xml which I propose is used in > the use case to illustrate the requirements - see below. > > I can prepare this as a github pull request as HTML if you prefer or adjust > any of this based on your feedback. > > Many thanks, > Tim > > > Use Case #21 - The Darwin Core Archive standard (GBIF) > (Contributed by Tim Robertson, trobertson@gbif.org GBIF) > > The Darwin Core Archive (DwC-A) standard > (http://rs.tdwg.org/dwc/terms/guides/text/index.htm) is the primary format > in use for exchange of evidence based biodiversity data on the Global > Biodiversity Information Facility (GBIF http://www.gbif.org) network. The > GBIF network spans over 600+ institutions, and has mobilised more than 435 > million records (http://www.gbif.org/occurrence). The DwC-A format is > embedded in many software platforms, including web based tools that allow > mapping of arbitrary database schemas. An online validator exists to verify > the format (http://tools.gbif.org/dwca-validator/). > The DwC-A format is effectively a collection of related CSV files > accompanied by a metafile (meta.xml) that describes the structure and > content of the CSVs along with their relationships. Together these files > are zipped to allow transfer in a single HTTP transaction. > > The key characteristics of the DwC-A format are: > - The ability to define the class of content contained within a single row > - The ability to declare a relationship between files (only many-to-one > relationships in a star schema are currently supported) > - The ability to describe remote CSV files through a meta file, without > modifying the source files > > The next evolution of the DwC-A needs to consider the following key uses: > - More complex arrangements of data relationships (e.g. arbitrary relational > models) > - Stronger typing of data formats (only date formats are currently declared) > [It is the hope of the DwC-A standard authors that the results of the CSV WG > will mean the DwC-A can be deprecated, and efforts can be spent on > developing tooling that supports the W3C CSV standard/recommendations] > > Example: Suggest using the attached meta.xml to indicate the relationships > > Requires: HeadingColumns, CellValueMicroSyntax, NonStandardFieldDelimiter, > ExternalDataDefinitionResource, AnnotationAndSupplementaryInfo, > AssociationOfCodeValuesWithExternalDefinitions, SyntacticTypeDefinition , > PrimaryKey, ForeignKeyReferences, MissingValueDefinition, > MultipleHeadingRows > > > > > > On 06 May 2014, at 19:41, Eric Stephan <ericphb@gmail.com> wrote: > > Tim, > > I agree, I do think it makes sense to include this in the use case > document. Thank you for sharing, and yes could you please provide > example(s) to illustrate the use case? Either text or images > showing snapshots of the examples would be great. I am copying the > csv working group distribution list as well. > > > Thank you, > > Eric > > On Tue, May 6, 2014 at 2:47 AM, Tim Robertson [GBIF] > <trobertson@gbif.org> wrote: > > Hi Jeremy, Davide, Eric, > > Are you still accepting use cases for the CSV WG [1] document you are > compiling? > > If so, I am keen to submit one for the GBIF network [2] and would start > documenting one along the lines of the existing 20 cases. It is unlikely to > bring significant new requirements, but would encapsulate pretty much all of > the existing ones, and the devil is always in the detail with this kind of > thing (e.g. null handling, micro syntax, default value policies etc) - our > use case may well bring in some sub requirements / ideas. Our case is more > closely aligned with Google DSPL [3] than the others however (e.g. an XML > document that serves to define the content found in CSVs and their > relationships - I assume these to be considered "CSV annotations”). I am a > little surprised not to see a G-DSPL on the list of use cases - should it be > one? I would be happy to produce an example for that as well if considered > useful. My slight worry is that unless cases such as ours and G-DSPL are > considered, the foreign key / primary key requirements *may* not be > adequately addressed consistently (e.g. referential integrity with respect > to well-formedness, expected behaviour on NULLs etc). > > Thanks for the consideration - please do help advise me if my ideas / > proposals are off topic. > I should mention that maintaining a standard for CSV handling is part of my > core job, and fundamental to our infrastructure - this is a group of real > importance to our work. I’d be happy to help in any way I can. > > Best wishes, > Tim > > [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html > [2] http://www.gbif.org/ > [3] https://developers.google.com/public-data/ > > >
Received on Tuesday, 13 May 2014 23:16:32 UTC