- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Mon, 10 Jul 2023 11:21:46 -0700
- To: Karen Coyle via GitHub <sysbot+gh@w3.org>
- Cc: public-dxwg-wg@w3.org
I agree that both file size and dataset size are useful. In the world of high-performance computing, unfortunately, the times of needing to know whether a download completed haven’t yet receded into the past. A few gigabytes are not large in this realm. File movements at the terabyte to hundreds of terabytes level are common, so special tools are needed, and care must be taken to maximize throughput without causing trouble for others on the network. I often field queries from users about how to go about moving a dataset from one storage tier to another or from one site to another. So, size definitely can matter and should be expressible. Another potentially important piece of the puzzle is the number of inodes (files or directories) involved when the dataset is unpacked, since some storage can be finicky about storing many small files. The number of rows in a data table can also matter to whether it can be fit into a certain type of database or can be manipulated with certain analysis tools. Often the number of rows maps in a general way to the usefulness of a scientific dataset, though depending on the dataset its size may be better expressed in more domain-specific terms, like degrees of the sky for astronomical data, or spatial resolution for climate data. > On Jul 7, 2023, at 9:52 AM, Karen Coyle via GitHub <sysbot+gh@w3.org> wrote: > > To my mind, number of records is a human-facing bit of info that gives a person an idea of the scope of the information prior to downloading. Number of bytes is reminiscent of those large software downloads in times past when you needed to know that the download had completed. However, for very large files it is useful to know that they ARE very large - which today means multiple gigabytes. For smaller files I doubt if byte size matters. > > Therefore, both measures are needed but are useful under specific circumstances. > > -- > GitHub Notification of comment by kcoyle > Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1571#issuecomment-1625685480 using your GitHub account > > > -- > Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config >
Received on Monday, 10 July 2023 18:21:54 UTC