- From: D'Haenens Thomas <thomas.dhaenens@kb.vlaanderen.be>
- Date: Thu, 22 Jun 2017 15:18:54 +0000
- To: "public-dxwg-wg@w3.org" <public-dxwg-wg@w3.org>
- Message-ID: <AM5PR0901MB1473563FB964392065B2640EC4DB0@AM5PR0901MB1473.eurprd09.prod.outlook.>
Hi Peter, all, I think this is a very general discussion. When do we have a new instance of something and when are we talking about another version/instance. In my experience, there isn’t a magic solution. When it comes to organizational dynamics, I usually rely on the formal regulations that give birth to a change. Especially in the public sector, one normally has a decree/law, an organizational decision or some kind of formal document. In a classic physical (printed/written) work, you have editions, versions, and the like. I believe a dataset isn’t so different. I believe clarity is crucial. IMHO, first of all, at the heart of things we have to decide what really identifies a concept, let’s say a dataset. We omit anything that’s not crucial. (‘what is core’) When that core changes, you have a new dataset (with a new identifier). When the core stays intact, but something around it changes (lastUpdated, publishedBy, …), we define a new life phase of the object (a ‘version’ of an instance). Starting from that core we have many relations to other things. Relating to other classes implies we have both a history on the part of the class and on the part of the relationship. The same goes for circular relationships (organisations, services, datasets can be ‘adopted’ in their child-parent-relationship). So, eg. the structure of a dataset changes (and we could have decided the structure is a critical element of a dataset), that implies we have a new instance. Data is added to a dataset – with no structural changes – it’s simply a new version with some metadata that changed. To go a step further, in the spirit of any base registry, you enable keeping track of history on any relationship and on any class. Then, within any implementation, you choose what to capture (and you are aware what you don’t capture). Taking it back to classic printed books, a 4th edition of Goethe’s Faust isn’t a new instance with regards to the 3rd ed. It’s simply a new version. You could as easily rebind the book, creating also a new version. The some goes, as said, for organizational dynamics, … Always happy to have some discussion regarding this topic ☺. Cheers, Thomas From: Peter.Winstanley@gov.scot [mailto:Peter.Winstanley@gov.scot] Sent: donderdag 22 juni 2017 16:37 To: andrea.perego@ec.europa.eu Cc: marvin.frommhold@eccenca.com; martin.bruemmer@eccenca.com; makx@makxdekkers.com; kcoyle@kcoyle.net; david.browning@thomsonreuters.com; public-dxwg-wg@w3.org Subject: RE: Question for DCAT "experts" Thanks for pointing this out Andrea … it certainly could be helpful. Although this is something that crops up frequently in the spatial data world, you’ll see from the school illustration that I gave that it is a regular occurrence in organisational dynamics too. From: andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> [mailto:andrea.perego@ec.europa.eu] Sent: 22 June 2017 15:15 To: Winstanley FP (Peter) Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com>; public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: RE: Question for DCAT "experts" Hi, Peter. > the other provenance pattern that we have which is tricky relates to things like property > or schools or fields (land parcel units) where we see splitting or clumping and so the > update involves new entities that can include the old one (e.g. one house gets split into > two. The identifier for the old house is retained, but it now contains two new houses. It > is possible that at some later date the two are joined back into one. Same happens with > schools, fields and so on) I really don’t know the best practice for this and would > appreciate a side discussion if anyone’s interested. Just to note that this issue – versioning of the things being described by data – was subject to discussion in the Spatial Data on the Web WG, and it is addressed by a specific Best Practice (BP11): https://www.w3.org/TR/sdw-bp/#desc-changing-properties I wonder whether this BP could provide some guidance. Andrea ---- Andrea Perego, Ph.D. Scientific / Technical Project Officer European Commission DG JRC Directorate B - Growth and Innovation Unit B6 - Digital Economy Via E. Fermi, 2749 - TP 262 21027 Ispra VA, Italy https://ec.europa.eu/jrc/ ---- The views expressed are purely those of the writer and may not in any circumstances be regarded as stating an official position of the European Commission. From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot] Sent: Thursday, June 22, 2017 4:04 PM To: david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com>; public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; PEREGO Andrea (JRC-ISPRA) Subject: RE: Question for DCAT "experts" the other provenance pattern that we have which is tricky relates to things like property or schools or fields (land parcel units) where we see splitting or clumping and so the update involves new entities that can include the old one (e.g. one house gets split into two. The identifier for the old house is retained, but it now contains two new houses. It is possible that at some later date the two are joined back into one. Same happens with schools, fields and so on) I really don’t know the best practice for this and would appreciate a side discussion if anyone’s interested. From: david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> [mailto:david.browning@thomsonreuters.com] Sent: 22 June 2017 15:00 To: Winstanley FP (Peter); public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> Subject: RE: Question for DCAT "experts" Yes, we have that pattern/behaviour/feature too. Generally we support that by retaining a history of changes within the data that’s being exchanged – its sometimes important to know that you were wrong (or misinformed) so you can explain the decision you made..... That does risk it getting very complex and probably domain specific. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · David Browning Platform Technology Architect Thomson Reuters Phone: +41(058) 3065054 Mobile: +41(079) 8126123 david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> thomsonreuters.com<http://thomsonreuters.com/> From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot] Sent: 22 June 2017 15:45 To: Browning, David (TRGR); public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> Subject: RE: Question for DCAT "experts" In the statistical areas we have lots of the ‘more data’ situation (all these longitudinal datasets) but sometimes (e.g. with pupil statistics where there can be an exam result that on appeal gets adjusted in grade) the ‘more data’ can be accompanied by a revision of what was previously there From: david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> [mailto:david.browning@thomsonreuters.com] Sent: 22 June 2017 14:39 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; Winstanley FP (Peter); kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> Subject: RE: Question for DCAT "experts" In my experience, this scenario – where a data publisher makes available a sequence of update files that encode incremental changes from some initialising “complete” publication – is the pre-dominant pattern of data exchange in the financial information domain (outside the streaming/realtime data delivery) so it’s one that we’re extremely interested in. We’d like to be able to leverage DCAT as much as possible though clearly we may be extending into a more specialised vocabulary beyond what’s appropriate for inclusion in the DCAT standard itself. As an example, we’d like to be able to automatically process the sequence of files by using the metadata as configuration information so that a data consumer could create a local replica of the latest state of the published data. Our current thinking actually aims to handle this at the dct:Distribution level – though this is still active work and we haven’t yet settled on a definite approach. As Andrea says, from a data consumer’s perspective, there’s some kind of link here with how service-based access might be modelled, since in my experience that’s often used to give access to a ‘latest state/current value’ copy of the data. [Of course, that might not be the direction the WG takes....] So I’d like to see this considered as a potential use case for discussion till we have a fuller understanding of the landscape. Last comment: we tend to think of this kind of change - “more data” – as somewhat distinct from version changes. A version change typically indicates something potentially more disruptive to a consumer than an update file. And, yes, there is some ambiguity in that distinction – but in practice it’s been a useful distinction in our environment. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · David Browning Platform Technology Architect Thomson Reuters Phone: +41(058) 3065054 Mobile: +41(079) 8126123 david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> thomsonreuters.com<http://thomsonreuters.com/> From: andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> [mailto:andrea.perego@ec.europa.eu] Sent: 22 June 2017 14:15 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> Subject: RE: Question for DCAT "experts" I think it may be worth considering whether the scenario outlined by Karen ("incremental updates"?) relates also to the notion of dcat:Distribution. In particular, for the mechanism used to for synchronising data version, I see some relationship with the use cases concerning the modelling of service-based data access. Andrea ---- Andrea Perego, Ph.D. Scientific / Technical Project Officer European Commission DG JRC Directorate B - Growth and Innovation Unit B6 - Digital Economy Via E. Fermi, 2749 - TP 262 21027 Ispra VA, Italy https://ec.europa.eu/jrc/<https://urldefense.proofpoint.com/v2/url?u=https-3A__ec.europa.eu_jrc_&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=VFZ5l54iFiCMEum8wIpiZSpB2JHnIMQ7MRrSEdMlAiw&e=> ---- The views expressed are purely those of the writer and may not in any circumstances be regarded as stating an official position of the European Commission. From: Martin Brümmer [mailto:martin.bruemmer@eccenca.com] Sent: Thursday, June 22, 2017 11:35 AM To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Cc: Marvin Frommhold Subject: Re: Question for DCAT "experts" Hi there, a colleague of mine, Marvin Frommhold, is researching versioning in the context of RDF and Linked Data. He contributes the following points: The following two documents provide a basic introduction to versioning of datasets: * Papakonstantinou, Vassilis et al. “Versioning for Linked Data: Archiving Systems and Benchmarks.” BLINK@ ISWC. users.ics.forth.gr, 2016. Web.<https://urldefense.proofpoint.com/v2/url?u=http-3A__ceur-2Dws.org_Vol-2D1700_paper-2D05.pdf&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=-ToimaJTj9bR0AuWNhNZ00_s2nfj0f0YTogpBj-wxdc&e=> * Section 2 of this paper provides an introduction of different archiving strategies. * Gray, Alasdair J. G. et al. “Dataset Descriptions: HCLS Community Profile.” Interest group note, W3C (May 2015) http://www.w3.org/TR/hcls-dataset (2015): n. pag. Print.<https://www.w3.org/TR/hcls-dataset/> * A W3C Interest Group Note that, among other things, discusses requirements for dataset versioning. * "The Data Catalog Vocabulary (DCAT) [DCAT<https://www.w3.org/TR/hcls-dataset/#DCAT>] is used to describe datasets in catalogs, but does not deal with the issue of dataset evolution and versioning." He agrees that change sets are related to versioning in that a version can be described as a set of changes. Fully realized, this allows very granular tracking of dataset evolution. Makx point is important here: These changes are granular descriptions about the evolving content of a dataset, where DCAT so far does little to describe the data itself. If DCAT started to describe the content and structure of the data, this would be a considerable expansion of its scope. The question if a set of changes constitute a new dataset or if a whole database is a dataset is complicated to me, because I understand instances of dcat:Dataset as conceptual descriptions of datasets, largely independent of the structure of the underlying data. In that sense, a database or a web service independent of the query can also be datasets. Limiting the data retrieved from it by some API call or SQL query could then create a new dataset fully contained in the first one. cheers, Martin Am 22/06/17 um 11:00 schrieb Makx Dekkers: Yes, I agree it is. Updating 'in place' is a case where the publisher decides that a change does not create a new Dataset. I find Karen's suggestion to treat a 'database' as a 'dataset' interesting -- I have always thought of a database as closer to a dcat:Catalog. Makx. -----Original Message----- From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot] Sent: 22 June 2017 10:52 To: mail@makxdekkers.com<mailto:mail@makxdekkers.com>; public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: RE: Question for DCAT "experts" isn't a change set (like a diff) just a special case of versioning? -----Original Message----- From: Makx Dekkers [mailto:mail@makxdekkers.com] Sent: 22 June 2017 09:47 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: RE: Question for DCAT "experts" As far as I remember from the initial work on DCAT, a Dataset is considered to be a kind of blob. Nothing is said about what goes on 'inside' a Dataset. The only thing you see on the outside is the modification date but you don't know what has changed inside. Makx -----Original Message----- From: Karen Coyle [mailto:kcoyle@kcoyle.net] Sent: 21 June 2017 17:31 To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org> Subject: Question for DCAT "experts" Many of you know DCAT quite well, and I'm new to it, so I'm taking the lazy way and directing this as a question to you. I see in DCAT that there are properties that define frequency and update dates. The update date is "Most recent date on which the dataset was changed, updated or modified." The library world has a number of databases that are updated "in place". For anyone receiving updates, the updates do not include the entire file, only those records added, changed, or deleted since some set time. Is this covered by DCAT? If not, I will add a use case and we can discuss. Thanks, kc -- Karen Coyle kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net<https://urldefense.proofpoint.com/v2/url?u=http-3A__kcoyle.net&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=PEF3YuzzKpCupmPY7NjFyFh0zf3uaWiV484O7rPeRbs&e=> m: 1-510-435-8234 (Signal) skype: kcoylenet/+1-510-984-3600 ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.symanteccloud.com&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=RS-ryPIAJX-DdeLAjoV6_iQv-7ExPAEAv3dX7hqb1Y0&e=> ______________________________________________________________________ *********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses. ******************************************************************** ********************************************************************** This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return. Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government. Tha am post-d seo (agus faidhle neo ceanglan còmhla ris) dhan neach neo luchd-ainmichte a-mhàin. Chan eil e ceadaichte a chleachdadh ann an dòigh sam bith, a’ toirt a-steach còraichean, foillseachadh neo sgaoileadh, gun chead. Ma ’s e is gun d’fhuair sibh seo le gun fhiosd’, bu choir cur às dhan phost-d agus lethbhreac sam bith air an t-siostam agaibh, leig fios chun neach a sgaoil am post-d gun dàil. Dh’fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba air a chlàradh neo air a sgrùdadh airson dearbhadh gu bheil an siostam ag obair gu h-èifeachdach neo airson adhbhar laghail eile. Dh’fhaodadh nach eil beachdan anns a’ phost-d seo co-ionann ri beachdan Riaghaltas na h-Alba. ********************************************************************** -- Martin Brümmer Linked Data Consultat phone +49 341 26508028 martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com> Postanschrift / Postal address: eccenca GmbH | Hainstraße 8 | 04109 Leipzig | Germany eccenca GmbH Hainstraße 8 | 04109 Leipzig | Germany Geschäftsführer / Board of Directors: Hans-Chr. Brockmann Sitz und Registergericht / Domicile and Court of Registry: Leipzig HRB-Nr. / Commercial Register No.: 29201 USt-ID / VAT registration No.: DE 289172708 Diese Mail kann vertrauliche Informationen enthalten. Wenn Sie nicht Adressat sind, sind Sie nicht zur Verwendung der in dieser Mail enthaltenen Informationen befugt. Bitte benachrichtigen Sie uns sofort über den irrtümlichen Empfang. This e-mail may contain confidential information. If you are not the addressee you are not authorized to make use of the information contained in this e-mail. Please inform us immediately that you have received it by mistake. ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.symanteccloud.com&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=fRo50yJizVSUHfDi7E757jy8R7i7W6Y-hkkN2NtudUY&s=-b8B4WeCO4GlJStxswVYUKf3_1lhDWt7WKWzgvy1WAE&e=> ______________________________________________________________________ *********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses. ******************************************************************** ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________ *********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses. ******************************************************************** ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________ *********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses. ********************************************************************
Received on Thursday, 22 June 2017 15:19:35 UTC