RE: Datasets and their storage

Hi Nicholas,

There isn’t anything explicit, but speaking personally I hope that this is something that we can address  in some form.  It may be less about the storage and more about the service or API that gives access to that storage.
There are requirements in the UCR that point in that direction:

6.5.2 Distribution schema (https://www.w3.org/TR/dcat-ucr/#RDIS, particularly the first note that starts talking about service behaviour profiles and
6.55.3 Distribution service (https://www.w3.org/TR/dcat-ucr/#RDISV in particular its related use case at ID18 https://www.w3.org/TR/dcat-ucr/#ID18 )

These aren’t as explicit as your examples, but I definitely support the idea that  updated DCAT should provide a more complete story than the existing version.

[The use cases/requirements are still only at FPWD – if I remember the discussion correctly there is still scope to add more use cases if these help drive the work.]


· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
David Browning
Platform Technology Architect

Thomson Reuters

Phone: +41(058) 3065054
Mobile: +41(079) 8126123

david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>



From: Nicholas.Car@csiro.au [mailto:Nicholas.Car@csiro.au]
Sent: 02 February 2018 12:40
To: public-dxwg-wg@w3.org
Subject: Datasets and their storage

There doesn’t appear to be any references in our Use Cases or Reqs to datasets’ storage or any other aspects of their physical presense. Is this entirely out of scope for the WG?

In DCAT 1, there are some trivial Distribution properties downloadURL/accessURL and byteSize but no sophisticated handling of dataset physical presence or availability such as access latency, availability zones and so on. I think it is important to cater for this aspect of datasets in metadata as catalogues are now dealing with large volumes, cloud storage, Content Delivery Networks, different networks (Internet2?) etc. Certainly, it would be useful in some of the dataset catalogues I’ve encountered to record this information within dataset metadata as it can be critical for distribution as well as management.

Would it be appropriate for DCAT 2 to cater for this aspect of Datasets? I understand if it’s just too late in the Use Case process to consider this but I think it is important.

Prior work
A quick review of vocabs/ontologies in http://lov.okfn.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__lov.okfn.org&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=h7mbhZNA_ubAEdsGv6BJvrVRW4kK107XFt3IEOqqgvc&s=-D2lRvEYoPohNJ-57ANTosDbjIhImI2-RR4uaKk-Amo&e=> for mentions of digital storage doesn’t yield much thus there is room to lead:

Several ontologies, like EBUCore - the Dublin Core for media, just have a few notes on storage. EBUCore only has
http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#StorageType<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebu.ch_metadata_ontologies_ebucore_ebucore-23StorageType&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=h7mbhZNA_ubAEdsGv6BJvrVRW4kK107XFt3IEOqqgvc&s=F-8jBmJ_tl1BXB6MJ0P4yYmboNp3tEUM0a1gCCDAiT0&e=>
“The type of storage used for the repository. This is provided as free text in an annotation label or as an identifier pointing to a term in a classification scheme.”

Good Relations has a number of classes in a “Storage Media Vocabulary” (http://www.ebusiness-unibw.org/ontologies/opdm/storagemedia.html)<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebusiness-2Dunibw.org_ontologies_opdm_storagemedia.html-29&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=h7mbhZNA_ubAEdsGv6BJvrVRW4kK107XFt3IEOqqgvc&s=hKlM7xOzL8OTUlIuwnW-eLYp4SM-uxzGJHub2pi9bCU&e=> but this is consumer hardware focussed and likely not of great interest to enterprise catalogues maintainers and other implementers of DCAT.

OBO Foundry only has “A storage service in which a service consumer provides data as input which a service provider stores and returns as output in its original form” (http://purl.obolibrary.org/obo/OBI_0001533<https://urldefense.proofpoint.com/v2/url?u=http-3A__purl.obolibrary.org_obo_OBI-5F0001533&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=h7mbhZNA_ubAEdsGv6BJvrVRW4kK107XFt3IEOqqgvc&s=Tw37huJRZVc6f38N7gy5v98AH2FdCyUFTv-bkbwZlAA&e=>), so no axioms other than superclasses and no properties of a storage system/service.

The Nature core ontology (http://www.nature.com/ontologies/core/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.nature.com_ontologies_core_&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=h7mbhZNA_ubAEdsGv6BJvrVRW4kK107XFt3IEOqqgvc&s=fBPpdbTkAZYPiGFXmOYRSlB6nFUpLgcBH5vjzLaZ4r0&e=>) has “The :repository property relates an asset to a storage repository” and “The :repositoryId property specifies a (local) repository ID for an asset” but no other details.

Nick

Received on Friday, 2 February 2018 15:22:44 UTC