Re: [dxwg] How to specify the number of records in a dataset (#1571)

Thanks @bertvannuffelen for the summary. Size indeed depends a lot on context. 

> To get a harmonised view the size will be a complex datatype, having properties:
>
> - value: the number
> - unit: what is counted
> - method: the method of counting

This goes beyong the original request. Just a cardinal number and a unit what is being counted is enough. There are several ways to express it in RDF:

1. property with custom datatype: rarely used at all and problematic because they need to be mapped to XSD number types
2. unit-specific property with number as value: easy to use but they need to be defined for each unit
3. generic size property with string as value: easy for humans, little use for computation
4. generic size property with blank node object having number, unit and optional more details (date, method...): most flexible but blank nodes are unpleasant and it requires at least three properties to be defined

There already *are* unit-specific properties such as `dcat:byteSize`, `void:triples`, `void:entities`, `wd:P4876`... In my opinion *some* units are frequent and generic enough to justify a DCAT property, e.g. number of files or number of records. At least DCAT should mention VOID vocabulary to be used to specify size of RDF datasets. For units not supported or mentioned by DCAT, the specification should recommend using `dcterms:extent` and tell how to specify number and unit.

> I think it would be good to provide evidence from existing data portals and communities where size is a critical

Numbers beyond number of bytes are common, just browse around in any data catalog. I just looked at the first topic I thought of (astronomy) and found two examples within a minute:

* https://data.nasa.gov/Space-Science/Mars-orbital-image-HiRISE-labeled-data-set-version/egmv-36wq number of *landmarks* (very domain-specific unit)
* https://data.nasa.gov/Aerospace/NASA-TechPort/bq5k-hbdz - number of *rows* and *columns* (very generic unit)

Additional examples are listed [in my recent comment](https://github.com/w3c/dxwg/issues/1571#issuecomment-1625986016).

-- 
GitHub Notification of comment by nichtich
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1571#issuecomment-1630750767 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 11 July 2023 12:39:03 UTC