Re: use case: NetCDF data from Ivan Herman on 2014-02-26 (public-csv-wg@w3.org from February 2014)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 26 Feb 2014 13:28:27 +0100
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: Eric Stephan <ericphb@gmail.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <F567D960-8CAB-440A-A10A-A009B4B2EC9B@w3.org>
On 26 Feb 2014, at 11:54 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:

> OK - now you're talking :-)
> 
> At the Met Office (& many of the organisations we collaborate with), NetCDF is in common usage.
> 
> I had thought that you wanted to get into working directly with the binary ... in which case I was going to say NetCDF3, NetCDF4-Classic or NetCDF-Extended :-) ... I think I know too much (at least in this corner of the data-sphere).
> 
> For what it's worth, our files get pretty large (I suspect yours do too!), so I don't see much use of ncdump to generate a text format. Instead, we encourage people to work directly with the binary format using appropriate tools like Iris <http://scitools.org.uk/iris/> that layer on top of the netCDF libraries to work with data conforming to the CF (Climate and Forecast) metadata conventions <http://cf-pcmdi.llnl.gov/>.
> 
> On the mailing list <http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0157.html>, danbri talked about "interesting and useful work that can be done for _all_ tables, at a broad brush level of granularity" without getting stuck into the innards of the file. Just having access to this summary and/or structural metadata would help discovery.
> 
> So I think the use case could follow these lines:
> 
> - big scientific tabular dataset published in netCDF 
> - publish summary and structural metadata about the "tabular data" for LOD discovery purposes
> - download manageable subsets of the tabular dataset in tabular-text format - perhaps paginated?

I like that. We should be careful, then, to formulate the metadata in a form that does not rely on textual data. But that should not be a major problem...

Ivan


> 
> This is not dissimilar to use case #7 <https://www.w3.org/2013/csvw/wiki/Use_Cases#A_local_archive_of_metadata_for_a_collection_of_journal_articles> in which a tabular result set is described but presented in manageable "pages" to the user one CSV file at a time. 
> 
> As for #7 we need to be wary of getting into PROTOCOL DESIGN for accessing subsets of data ... or at least do so with our eyes open!
> 
> What do think Eric ... could you turn this into a narrative-style use case?
> 
> Jeremy
> 
> -----Original Message-----
> From: Eric Stephan [mailto:ericphb@gmail.com] 
> Sent: 26 February 2014 10:25
> To: Stasinos Konstantopoulos
> Cc: Ivan Herman; Tandy, Jeremy; W3C CSV on the Web Working Group
> Subject: Re: use case: NetCDF data
> 
> Sorry trying to keep the gates of hell open a bit still. :-)   I think
> I'm on board.
> 
> Instead of "tabular text files", I'd prefer to view the constraint as "tabular text data"
> 
>> From a LOD perspective a given NetCDF resource could be accessible:
>    *  in its native format
>    * expressed in its text form through the ncdump utility.
> 
> The Native NetCDF form provides the scientific community self describing data.
> The text form could be used to within the CSVW context as a means for Star 4 and Star 5 discovery.
> 
> There are lots of scientific binary formats out there that represent n-dimensional data blocks but each usually provide a means to dump in a textual form or be expressed in an alternative format that can be loaded into a spreadsheet for analysis.
> 
> Its somewhat of a similar concept as thinking of a relational database table or triple store dumped in a tabular form.
> 
> Sound okay?
> 
> Eric
> 
> 
> 
> On Wed, Feb 26, 2014 at 1:05 AM, Stasinos Konstantopoulos <konstant@iit.demokritos.gr> wrote:
>> On 26 February 2014 10:40, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>> On 25 Feb 2014, at 23:18 , Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
>>> 
>>>> Hi Stasinos ... thanks for the use cases you've provided so far.
>>>> 
>>>> Looking at your NetCDF data use case, I wonder if non-textual tabular data is in scope. The discussion on the "Scoping Question" thread in the mailing list seemed to suggest that we would focus on textual tabular data.
>>>> 
>>>> Before progressing, I wanted to get your thoughts and gather input from the other WG participants.
>>> 
>>> Well... I did not know NetCDF before, so I peeked around a bit. I may have missed some details, but the impression is that this is, primarily, a set of utilities in various programming languages to handle tabular data that is in some internal format. They do have some ways of dumping data in terms of text:
>>> 
>>> http://www.narccap.ucar.edu/data/ascii-howto.html
>>> 
>>> and, as far as I could see some of the examples there the output is 'simply' CSV (well, probably TSV or 'SSV', ie, 'space separated values').
>>> 
>>> I would support Jeremy's formulation, that we focus on 'textual tabular data'.
>> 
>> That's alright, and it will keep hell's gates slightly less widely 
>> open. We still have Eric's examples of metadata headers.
>> 
>> NetCDF also forsees a single metadata description covering multiple 
>> data files, but I believe this to be a more general concern as there 
>> are many instances of homogeneous CSV data files that can better be 
>> described at one shot.
>> 
>> Best,
>> s
>> 
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf
Received on Wednesday, 26 February 2014 12:29:01 UTC