Re: CSV Use cases from Eric Stephan on 2014-05-13 (public-csv-wg@w3.org from May 2014)

From: Eric Stephan <ericphb@gmail.com>
Date: Tue, 13 May 2014 16:16:04 -0700
To: "Tim Robertson [GBIF]" <trobertson@gbif.org>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Jeremy Tandy <jeremy.tandy@metoffice.gov.uk>, "Ceolin, D." <d.ceolin@vu.nl>, Ivan Herman <ivan@w3.org>
Message-ID: <CAMFz4jgcOA=H6jgrxf_PUiLjHsSajf+Dv9TRTgsz1kK9kO3pLQ@mail.gmail.com>
Tim,

Thanks so much for your contribution.  I've added it to the Use Case
document.  Let me know if anything was missed.

http://w3c.github.io/csvw/use-cases-and-requirements/index.html

Eric

On Wed, May 7, 2014 at 4:41 AM, Tim Robertson [GBIF]
<trobertson@gbif.org> wrote:
> Thanks Eric
> Please can you consider a use case as described below?
> Attached is an example and a separate meta.xml which I propose is used in
> the use case to illustrate the requirements - see below.
>
> I can prepare this as a github pull request as HTML if you prefer or adjust
> any of this based on your feedback.
>
> Many thanks,
> Tim
>
>
> Use Case #21 - The Darwin Core Archive standard (GBIF)
> (Contributed by Tim Robertson, trobertson@gbif.org GBIF)
>
> The Darwin Core Archive (DwC-A) standard
> (http://rs.tdwg.org/dwc/terms/guides/text/index.htm) is the primary format
> in use for exchange of evidence based biodiversity data on the Global
> Biodiversity Information Facility (GBIF http://www.gbif.org) network.  The
> GBIF network spans over 600+ institutions, and has mobilised more than 435
> million records (http://www.gbif.org/occurrence).  The DwC-A format is
> embedded in many software platforms, including web based tools that allow
> mapping of arbitrary database schemas.  An online validator exists to verify
> the format (http://tools.gbif.org/dwca-validator/).
> The DwC-A format is effectively a collection of related CSV files
> accompanied by a metafile (meta.xml) that describes the structure and
> content of the CSVs along with their relationships.  Together these files
> are zipped to allow transfer in a single HTTP transaction.
>
> The key characteristics of the DwC-A format are:
> - The ability to define the class of content contained within a single row
> - The ability to declare a relationship between files (only many-to-one
> relationships in a star schema are currently supported)
> - The ability to describe remote CSV files through a meta file, without
> modifying the source files
>
> The next evolution of the DwC-A needs to consider the following key uses:
> - More complex arrangements of data relationships (e.g. arbitrary relational
> models)
> - Stronger typing of data formats (only date formats are currently declared)
> [It is the hope of the DwC-A standard authors that the results of the CSV WG
> will mean the DwC-A can be deprecated, and efforts can be spent on
> developing tooling that supports the W3C CSV standard/recommendations]
>
> Example: Suggest using the attached meta.xml to indicate the relationships
>
> Requires: HeadingColumns, CellValueMicroSyntax, NonStandardFieldDelimiter,
> ExternalDataDefinitionResource, AnnotationAndSupplementaryInfo,
> AssociationOfCodeValuesWithExternalDefinitions, SyntacticTypeDefinition ,
> PrimaryKey, ForeignKeyReferences, MissingValueDefinition,
> MultipleHeadingRows
>
>
>
>
>
> On 06 May 2014, at 19:41, Eric Stephan <ericphb@gmail.com> wrote:
>
> Tim,
>
> I agree, I do think it makes sense to include this in the use case
> document.  Thank you for sharing, and yes could you please provide
> example(s) to illustrate the use case?    Either text or images
> showing snapshots of the examples would be great.  I am copying the
> csv working group distribution list as well.
>
>
> Thank you,
>
> Eric
>
> On Tue, May 6, 2014 at 2:47 AM, Tim Robertson [GBIF]
> <trobertson@gbif.org> wrote:
>
> Hi Jeremy, Davide, Eric,
>
> Are you still accepting use cases for the CSV WG [1] document you are
> compiling?
>
> If so, I am keen to submit one for the GBIF network [2] and would start
> documenting one along the lines of the existing 20 cases.  It is unlikely to
> bring significant new requirements, but would encapsulate pretty much all of
> the existing ones, and the devil is always in the detail with this kind of
> thing (e.g. null handling, micro syntax, default value policies etc) - our
> use case may well bring in some sub requirements / ideas.  Our case is more
> closely aligned with Google DSPL [3] than the others however (e.g. an XML
> document that serves to define the content found in CSVs and their
> relationships - I assume these to be considered "CSV annotations”).   I am a
> little surprised not to see a G-DSPL on the list of use cases - should it be
> one?  I would be happy to produce an example for that as well if considered
> useful.  My slight worry is that unless cases such as ours and G-DSPL are
> considered, the foreign key / primary key requirements *may* not be
> adequately addressed consistently (e.g. referential integrity with respect
> to well-formedness, expected behaviour on NULLs etc).
>
> Thanks for the consideration - please do help advise me if my ideas /
> proposals are off topic.
> I should mention that maintaining a standard for CSV handling is part of my
> core job, and fundamental to our infrastructure - this is a group of real
> importance to our work.  I’d be happy to help in any way I can.
>
> Best wishes,
> Tim
>
> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html
> [2] http://www.gbif.org/
> [3] https://developers.google.com/public-data/
>
>
>
Received on Tuesday, 13 May 2014 23:16:32 UTC