- From: <david.browning@thomsonreuters.com>
- Date: Thu, 22 Jun 2017 13:38:52 +0000
- To: <public-dxwg-wg@w3.org>
- CC: <marvin.frommhold@eccenca.com>, <martin.bruemmer@eccenca.com>, <makx@makxdekkers.com>, <Peter.Winstanley@gov.scot>, <kcoyle@kcoyle.net>, <andrea.perego@ec.europa.eu>
- Message-ID: <B779901ABC68F84CBDE738D2FC6F00C26E498D6C@C111AZXLMBX02.ERF.thomson.com>
In my experience, this scenario – where a data publisher makes available a sequence of update files that encode incremental changes from some initialising “complete” publication – is the pre-dominant pattern of data exchange in the financial information domain (outside the streaming/realtime data delivery) so it’s one that we’re extremely interested in. We’d like to be able to leverage DCAT as much as possible though clearly we may be extending into a more specialised vocabulary beyond what’s appropriate for inclusion in the DCAT standard itself. As an example, we’d like to be able to automatically process the sequence of files by using the metadata as configuration information so that a data consumer could create a local replica of the latest state of the published data. Our current thinking actually aims to handle this at the dct:Distribution level – though this is still active work and we haven’t yet settled on a definite approach. As Andrea says, from a data consumer’s perspective, there’s some kind of link here with how service-based access might be modelled, since in my experience that’s often used to give access to a ‘latest state/current value’ copy of the data. [Of course, that might not be the direction the WG takes....] So I’d like to see this considered as a potential use case for discussion till we have a fuller understanding of the landscape. Last comment: we tend to think of this kind of change - “more data” – as somewhat distinct from version changes. A version change typically indicates something potentially more disruptive to a consumer than an update file. And, yes, there is some ambiguity in that distinction – but in practice it’s been a useful distinction in our environment. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · David Browning Platform Technology Architect Thomson Reuters Phone: +41(058) 3065054 Mobile: +41(079) 8126123 david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> thomsonreuters.com<http://thomsonreuters.com/> From: andrea.perego@ec.europa.eu [mailto:andrea.perego@ec.europa.eu] Sent: 22 June 2017 14:15 To: public-dxwg-wg@w3.org Cc: marvin.frommhold@eccenca.com; martin.bruemmer@eccenca.com; makx@makxdekkers.com; Peter.Winstanley@gov.scot; kcoyle@kcoyle.net Subject: RE: Question for DCAT "experts" I think it may be worth considering whether the scenario outlined by Karen ("incremental updates"?) relates also to the notion of dcat:Distribution. In particular, for the mechanism used to for synchronising data version, I see some relationship with the use cases concerning the modelling of service-based data access. Andrea ---- Andrea Perego, Ph.D. Scientific / Technical Project Officer European Commission DG JRC Directorate B - Growth and Innovation Unit B6 - Digital Economy Via E. Fermi, 2749 - TP 262 21027 Ispra VA, Italy https://ec.europa.eu/jrc/<https://urldefense.proofpoint.com/v2/url?u=https-3A__ec.europa.eu_jrc_&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=VFZ5l54iFiCMEum8wIpiZSpB2JHnIMQ7MRrSEdMlAiw&e=> ---- The views expressed are purely those of the writer and may not in any circumstances be regarded as stating an official position of the European Commission. From: Martin Brümmer [mailto:martin.bruemmer@eccenca.com] Sent: Thursday, June 22, 2017 11:35 AM To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: Marvin Frommhold Subject: Re: Question for DCAT "experts" Hi there, a colleague of mine, Marvin Frommhold, is researching versioning in the context of RDF and Linked Data. He contributes the following points: The following two documents provide a basic introduction to versioning of datasets: * Papakonstantinou, Vassilis et al. “Versioning for Linked Data: Archiving Systems and Benchmarks.” BLINK@ ISWC. users.ics.forth.gr, 2016. Web.<https://urldefense.proofpoint.com/v2/url?u=http-3A__ceur-2Dws.org_Vol-2D1700_paper-2D05.pdf&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=-ToimaJTj9bR0AuWNhNZ00_s2nfj0f0YTogpBj-wxdc&e=> * Section 2 of this paper provides an introduction of different archiving strategies. * Gray, Alasdair J. G. et al. “Dataset Descriptions: HCLS Community Profile.” Interest group note, W3C (May 2015) http://www.w3.org/TR/hcls-dataset (2015): n. pag. Print.<https://www.w3.org/TR/hcls-dataset/> * A W3C Interest Group Note that, among other things, discusses requirements for dataset versioning. * "The Data Catalog Vocabulary (DCAT) [DCAT<https://www.w3.org/TR/hcls-dataset/#DCAT>] is used to describe datasets in catalogs, but does not deal with the issue of dataset evolution and versioning." He agrees that change sets are related to versioning in that a version can be described as a set of changes. Fully realized, this allows very granular tracking of dataset evolution. Makx point is important here: These changes are granular descriptions about the evolving content of a dataset, where DCAT so far does little to describe the data itself. If DCAT started to describe the content and structure of the data, this would be a considerable expansion of its scope. The question if a set of changes constitute a new dataset or if a whole database is a dataset is complicated to me, because I understand instances of dcat:Dataset as conceptual descriptions of datasets, largely independent of the structure of the underlying data. In that sense, a database or a web service independent of the query can also be datasets. Limiting the data retrieved from it by some API call or SQL query could then create a new dataset fully contained in the first one. cheers, Martin Am 22/06/17 um 11:00 schrieb Makx Dekkers: Yes, I agree it is. Updating 'in place' is a case where the publisher decides that a change does not create a new Dataset. I find Karen's suggestion to treat a 'database' as a 'dataset' interesting -- I have always thought of a database as closer to a dcat:Catalog. Makx. -----Original Message----- From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot] Sent: 22 June 2017 10:52 To: mail@makxdekkers.com<mailto:mail@makxdekkers.com>; public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: RE: Question for DCAT "experts" isn't a change set (like a diff) just a special case of versioning? -----Original Message----- From: Makx Dekkers [mailto:mail@makxdekkers.com] Sent: 22 June 2017 09:47 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: RE: Question for DCAT "experts" As far as I remember from the initial work on DCAT, a Dataset is considered to be a kind of blob. Nothing is said about what goes on 'inside' a Dataset. The only thing you see on the outside is the modification date but you don't know what has changed inside. Makx -----Original Message----- From: Karen Coyle [mailto:kcoyle@kcoyle.net] Sent: 21 June 2017 17:31 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: Question for DCAT "experts" Many of you know DCAT quite well, and I'm new to it, so I'm taking the lazy way and directing this as a question to you. I see in DCAT that there are properties that define frequency and update dates. The update date is "Most recent date on which the dataset was changed, updated or modified." The library world has a number of databases that are updated "in place". For anyone receiving updates, the updates do not include the entire file, only those records added, changed, or deleted since some set time. Is this covered by DCAT? If not, I will add a use case and we can discuss. Thanks, kc -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net<https://urldefense.proofpoint.com/v2/url?u=http-3A__kcoyle.net&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=PEF3YuzzKpCupmPY7NjFyFh0zf3uaWiV484O7rPeRbs&e=> m: 1-510-435-8234 (Signal) skype: kcoylenet/+1-510-984-3600 ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.symanteccloud.com&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=RS-ryPIAJX-DdeLAjoV6_iQv-7ExPAEAv3dX7hqb1Y0&e=> ______________________________________________________________________ *********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses. ******************************************************************** ********************************************************************** This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return. Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government. Tha am post-d seo (agus faidhle neo ceanglan còmhla ris) dhan neach neo luchd-ainmichte a-mhàin. Chan eil e ceadaichte a chleachdadh ann an dòigh sam bith, a’ toirt a-steach còraichean, foillseachadh neo sgaoileadh, gun chead. Ma ’s e is gun d’fhuair sibh seo le gun fhiosd’, bu choir cur às dhan phost-d agus lethbhreac sam bith air an t-siostam agaibh, leig fios chun neach a sgaoil am post-d gun dàil. Dh’fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba air a chlàradh neo air a sgrùdadh airson dearbhadh gu bheil an siostam ag obair gu h-èifeachdach neo airson adhbhar laghail eile. Dh’fhaodadh nach eil beachdan anns a’ phost-d seo co-ionann ri beachdan Riaghaltas na h-Alba. ********************************************************************** -- Martin Brümmer Linked Data Consultat phone +49 341 26508028 martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com> Postanschrift / Postal address: eccenca GmbH | Hainstraße 8 | 04109 Leipzig | Germany eccenca GmbH Hainstraße 8 | 04109 Leipzig | Germany Geschäftsführer / Board of Directors: Hans-Chr. Brockmann Sitz und Registergericht / Domicile and Court of Registry: Leipzig HRB-Nr. / Commercial Register No.: 29201 USt-ID / VAT registration No.: DE 289172708 Diese Mail kann vertrauliche Informationen enthalten. Wenn Sie nicht Adressat sind, sind Sie nicht zur Verwendung der in dieser Mail enthaltenen Informationen befugt. Bitte benachrichtigen Sie uns sofort über den irrtümlichen Empfang. This e-mail may contain confidential information. If you are not the addressee you are not authorized to make use of the information contained in this e-mail. Please inform us immediately that you have received it by mistake.
Received on Thursday, 22 June 2017 13:40:17 UTC