- From: Tim Robertson [GBIF] <trobertson@gbif.org>
- Date: Wed, 7 May 2014 13:41:57 +0200
- To: Eric Stephan <ericphb@gmail.com>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Jeremy Tandy <jeremy.tandy@metoffice.gov.uk>, "Ceolin, D." <d.ceolin@vu.nl>, Ivan Herman <ivan@w3.org>
- Message-Id: <B697CC36-69A6-4BB5-B9BF-61621EC1676A@gbif.org>
Thanks Eric Please can you consider a use case as described below? Attached is an example and a separate meta.xml which I propose is used in the use case to illustrate the requirements - see below. I can prepare this as a github pull request as HTML if you prefer or adjust any of this based on your feedback. Many thanks, Tim Use Case #21 - The Darwin Core Archive standard (GBIF) (Contributed by Tim Robertson, trobertson@gbif.org GBIF) The Darwin Core Archive (DwC-A) standard (http://rs.tdwg.org/dwc/terms/guides/text/index.htm) is the primary format in use for exchange of evidence based biodiversity data on the Global Biodiversity Information Facility (GBIF http://www.gbif.org) network. The GBIF network spans over 600+ institutions, and has mobilised more than 435 million records (http://www.gbif.org/occurrence). The DwC-A format is embedded in many software platforms, including web based tools that allow mapping of arbitrary database schemas. An online validator exists to verify the format (http://tools.gbif.org/dwca-validator/). The DwC-A format is effectively a collection of related CSV files accompanied by a metafile (meta.xml) that describes the structure and content of the CSVs along with their relationships. Together these files are zipped to allow transfer in a single HTTP transaction. The key characteristics of the DwC-A format are: - The ability to define the class of content contained within a single row - The ability to declare a relationship between files (only many-to-one relationships in a star schema are currently supported) - The ability to describe remote CSV files through a meta file, without modifying the source files The next evolution of the DwC-A needs to consider the following key uses: - More complex arrangements of data relationships (e.g. arbitrary relational models) - Stronger typing of data formats (only date formats are currently declared) [It is the hope of the DwC-A standard authors that the results of the CSV WG will mean the DwC-A can be deprecated, and efforts can be spent on developing tooling that supports the W3C CSV standard/recommendations] Example: Suggest using the attached meta.xml to indicate the relationships Requires: HeadingColumns, CellValueMicroSyntax, NonStandardFieldDelimiter, ExternalDataDefinitionResource, AnnotationAndSupplementaryInfo, AssociationOfCodeValuesWithExternalDefinitions, SyntacticTypeDefinition , PrimaryKey, ForeignKeyReferences, MissingValueDefinition, MultipleHeadingRows On 06 May 2014, at 19:41, Eric Stephan <ericphb@gmail.com> wrote: > Tim, > > I agree, I do think it makes sense to include this in the use case > document. Thank you for sharing, and yes could you please provide > example(s) to illustrate the use case? Either text or images > showing snapshots of the examples would be great. I am copying the > csv working group distribution list as well. > > > Thank you, > > Eric > > On Tue, May 6, 2014 at 2:47 AM, Tim Robertson [GBIF] > <trobertson@gbif.org> wrote: >> Hi Jeremy, Davide, Eric, >> >> Are you still accepting use cases for the CSV WG [1] document you are >> compiling? >> >> If so, I am keen to submit one for the GBIF network [2] and would start >> documenting one along the lines of the existing 20 cases. It is unlikely to >> bring significant new requirements, but would encapsulate pretty much all of >> the existing ones, and the devil is always in the detail with this kind of >> thing (e.g. null handling, micro syntax, default value policies etc) - our >> use case may well bring in some sub requirements / ideas. Our case is more >> closely aligned with Google DSPL [3] than the others however (e.g. an XML >> document that serves to define the content found in CSVs and their >> relationships - I assume these to be considered "CSV annotations”). I am a >> little surprised not to see a G-DSPL on the list of use cases - should it be >> one? I would be happy to produce an example for that as well if considered >> useful. My slight worry is that unless cases such as ours and G-DSPL are >> considered, the foreign key / primary key requirements *may* not be >> adequately addressed consistently (e.g. referential integrity with respect >> to well-formedness, expected behaviour on NULLs etc). >> >> Thanks for the consideration - please do help advise me if my ideas / >> proposals are off topic. >> I should mention that maintaining a standard for CSV handling is part of my >> core job, and fundamental to our infrastructure - this is a group of real >> importance to our work. I’d be happy to help in any way I can. >> >> Best wishes, >> Tim >> >> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html >> [2] http://www.gbif.org/ >> [3] https://developers.google.com/public-data/
Attachments
Received on Wednesday, 7 May 2014 11:42:26 UTC