Re: [dxwg] How to specify the number of records in a dataset (#1571) from Annette Greiner via GitHub on 2023-07-10 (public-dxwg-wg@w3.org from July 2023)

From: Annette Greiner via GitHub <sysbot+gh@w3.org>
Date: Mon, 10 Jul 2023 21:05:09 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-1629730867-1689023107-sysbot+gh@w3.org>

I agree that both file size and dataset size are useful. In the world of high-performance computing, unfortunately, the times of needing to know whether a download completed haven’t yet receded into the past. A few gigabytes are not large in this realm. File movements at the terabyte to hundreds of terabytes level are common, so special tools are needed, and care must be taken to maximize throughput without causing trouble for others on the network. I often field queries from users about how to go about moving a dataset from one storage tier to another or from one site to another. So, size definitely can matter and should be expressible. Another potentially important piece of the puzzle is the number of inodes (files or directories) involved when the dataset is unpacked, since some storage can be finicky about storing or reading from many small files. The number of rows in a data table can also matter to whether it can be fit into a certain type of database or can be manipulated with certain analysis tools. Often the number of rows maps in a general way to the usefulness of a scientific dataset, though depending on the dataset, its size may be better expressed in more domain-specific terms, like degrees of the sky for astronomical data, or spatial resolution for climate data.

--
GitHub Notification of comment by agreiner
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1571#issuecomment-1629730867 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 10 July 2023 21:05:10 UTC