RE: use case: NetCDF data from Tandy, Jeremy on 2014-02-26 (public-csv-wg@w3.org from February 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Wed, 26 Feb 2014 10:54:07 +0000
To: Eric Stephan <ericphb@gmail.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE2B36813@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
OK - now you're talking :-)

At the Met Office (& many of the organisations we collaborate with), NetCDF is in common usage.

I had thought that you wanted to get into working directly with the binary ... in which case I was going to say NetCDF3, NetCDF4-Classic or NetCDF-Extended :-) ... I think I know too much (at least in this corner of the data-sphere).

For what it's worth, our files get pretty large (I suspect yours do too!), so I don't see much use of ncdump to generate a text format. Instead, we encourage people to work directly with the binary format using appropriate tools like Iris <http://scitools.org.uk/iris/> that layer on top of the netCDF libraries to work with data conforming to the CF (Climate and Forecast) metadata conventions <http://cf-pcmdi.llnl.gov/>.

On the mailing list <http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0157.html>, danbri talked about "interesting and useful work that can be done for _all_ tables, at a broad brush level of granularity" without getting stuck into the innards of the file. Just having access to this summary and/or structural metadata would help discovery.

So I think the use case could follow these lines:

- big scientific tabular dataset published in netCDF 
- publish summary and structural metadata about the "tabular data" for LOD discovery purposes
- download manageable subsets of the tabular dataset in tabular-text format - perhaps paginated?

This is not dissimilar to use case #7 <https://www.w3.org/2013/csvw/wiki/Use_Cases#A_local_archive_of_metadata_for_a_collection_of_journal_articles> in which a tabular result set is described but presented in manageable "pages" to the user one CSV file at a time. 

As for #7 we need to be wary of getting into PROTOCOL DESIGN for accessing subsets of data ... or at least do so with our eyes open!

What do think Eric ... could you turn this into a narrative-style use case?

Jeremy

-----Original Message-----
From: Eric Stephan [mailto:ericphb@gmail.com] 
Sent: 26 February 2014 10:25
To: Stasinos Konstantopoulos
Cc: Ivan Herman; Tandy, Jeremy; W3C CSV on the Web Working Group
Subject: Re: use case: NetCDF data

Sorry trying to keep the gates of hell open a bit still. :-)   I think
I'm on board.

Instead of "tabular text files", I'd prefer to view the constraint as "tabular text data"

>From a LOD perspective a given NetCDF resource could be accessible:
    *  in its native format
    * expressed in its text form through the ncdump utility.

The Native NetCDF form provides the scientific community self describing data.
The text form could be used to within the CSVW context as a means for Star 4 and Star 5 discovery.

There are lots of scientific binary formats out there that represent n-dimensional data blocks but each usually provide a means to dump in a textual form or be expressed in an alternative format that can be loaded into a spreadsheet for analysis.

Its somewhat of a similar concept as thinking of a relational database table or triple store dumped in a tabular form.

Sound okay?

Eric



On Wed, Feb 26, 2014 at 1:05 AM, Stasinos Konstantopoulos <konstant@iit.demokritos.gr> wrote:
> On 26 February 2014 10:40, Ivan Herman <ivan@w3.org> wrote:
>>
>> On 25 Feb 2014, at 23:18 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
>>
>>> Hi Stasinos ... thanks for the use cases you've provided so far.
>>>
>>> Looking at your NetCDF data use case, I wonder if non-textual tabular data is in scope. The discussion on the "Scoping Question" thread in the mailing list seemed to suggest that we would focus on textual tabular data.
>>>
>>> Before progressing, I wanted to get your thoughts and gather input from the other WG participants.
>>
>> Well... I did not know NetCDF before, so I peeked around a bit. I may have missed some details, but the impression is that this is, primarily, a set of utilities in various programming languages to handle tabular data that is in some internal format. They do have some ways of dumping data in terms of text:
>>
>> http://www.narccap.ucar.edu/data/ascii-howto.html
>>
>> and, as far as I could see some of the examples there the output is 'simply' CSV (well, probably TSV or 'SSV', ie, 'space separated values').
>>
>> I would support Jeremy's formulation, that we focus on 'textual tabular data'.
>
> That's alright, and it will keep hell's gates slightly less widely 
> open. We still have Eric's examples of metadata headers.
>
> NetCDF also forsees a single metadata description covering multiple 
> data files, but I believe this to be a more general concern as there 
> are many instances of homogeneous CSV data files that can better be 
> described at one shot.
>
> Best,
> s
>
Received on Wednesday, 26 February 2014 10:54:36 UTC