Re: [dxwg] How to specify the number of records in a dataset (#1571)

Just out curiosity, @nichtich could you provide a use case where users are depending on an exact value of the notion of size?
I hear this request sometimes, but I have not encountered a user that is using it in its data selection process or data processing. 

For me there are some reasons that size is not included.

With the introduction of APIs the need of size becomes very limited.  For APIs size becomes temporal dependent and since most data portals assume that metadata changes slowly (ones a week is a quick pace ;-) ) the property looses it value. (if the data is only harvested once a week, then the importance of the accuracy reduces.)  

I see it more featuring in file downloads, but even then I am not so sure if there is need to be exact.
E.g. as @akuckartz  mentions there is the bytesize for a distribution. But normally users do not care about the exact number: they care maybe more about the time it takes to download.   
In the practice the bytesize is not featuring in a human decision process.
Another use case could be the guarantee one has that the file is completely downloaded, but then checksum is a better choice to build an integrity check upon. 

Although the need for expressing size feels very natural, in the practice I seldom see publishers providing it because the high effort to keep track of sizes (both human and technical investment ).  Therefore I am curious about the use case that would motivate publishers to provide size information. 

-- 
GitHub Notification of comment by bertvannuffelen
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1571#issuecomment-1625158029 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 7 July 2023 09:52:01 UTC