W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > June 2017

RE: Question for DCAT "experts"

From: <Peter.Winstanley@gov.scot>
Date: Thu, 22 Jun 2017 14:04:06 +0000
To: <david.browning@thomsonreuters.com>, <public-dxwg-wg@w3.org>
CC: <marvin.frommhold@eccenca.com>, <martin.bruemmer@eccenca.com>, <makx@makxdekkers.com>, <kcoyle@kcoyle.net>, <andrea.perego@ec.europa.eu>
Message-ID: <BEA9D5BE2C1C76448E2955B1FD8769E10174290A0C@s0393g.scotland.gov.uk>
the other provenance pattern that we have which is tricky relates to things like property or schools or fields (land parcel units) where we see splitting or clumping and so the update involves new entities that can include the old one (e.g. one house gets split into two.  The identifier for the old house is retained, but it now contains two new houses.  It is possible that at some later date the two are joined back into one.  Same happens with schools, fields and  so on)  I really don’t know the best practice for this and would appreciate a side discussion if anyone’s interested.

From: david.browning@thomsonreuters.com [mailto:david.browning@thomsonreuters.com]
Sent: 22 June 2017 15:00
To: Winstanley FP (Peter); public-dxwg-wg@w3.org
Cc: marvin.frommhold@eccenca.com; martin.bruemmer@eccenca.com; makx@makxdekkers.com; kcoyle@kcoyle.net; andrea.perego@ec.europa.eu
Subject: RE: Question for DCAT "experts"

Yes, we have that pattern/behaviour/feature too.  Generally we support that by retaining a history of changes within the data that’s being exchanged – its sometimes important to know that you were wrong (or misinformed) so you can explain the decision you made..... That does risk it getting very complex and probably domain specific.

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
David Browning
Platform Technology Architect

Thomson Reuters

Phone: +41(058) 3065054
Mobile: +41(079) 8126123

david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>


From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot]
Sent: 22 June 2017 15:45
To: Browning, David (TRGR); public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>
Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu>
Subject: RE: Question for DCAT "experts"

In the statistical areas we have lots of the ‘more data’ situation (all these longitudinal datasets) but sometimes (e.g. with pupil statistics where there can be an exam result that on appeal gets adjusted in grade) the ‘more data’ can be accompanied by a revision of what was previously there

From: david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com> [mailto:david.browning@thomsonreuters.com]
Sent: 22 June 2017 14:39
To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>
Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; Winstanley FP (Peter); kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>; andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu>
Subject: RE: Question for DCAT "experts"

In my experience, this scenario – where a data publisher makes available a sequence of update files that encode incremental changes from some initialising “complete” publication – is the pre-dominant pattern of data exchange in the financial information domain (outside the streaming/realtime data delivery) so it’s one that we’re extremely interested in.  We’d like to be able to leverage DCAT as much as possible though clearly we may be extending into a more specialised vocabulary beyond what’s appropriate for inclusion in the DCAT standard itself.  As an example, we’d like to be able to automatically process  the sequence of files by using the metadata as configuration information so that a data consumer could create a local replica of the latest state of the published data. Our current thinking actually aims to handle this at the dct:Distribution level – though this is still active work and we haven’t yet settled on a definite approach.

As Andrea says, from a data consumer’s perspective, there’s some kind of link here with how service-based access might be modelled, since in my experience that’s often used to give access to  a ‘latest state/current value’ copy of the data.  [Of course, that might not be the direction the WG takes....]

So I’d like to see this considered as a potential use case for discussion  till we have a fuller understanding of the landscape.

Last comment:  we tend to think of this kind of change - “more data” – as somewhat distinct from version changes.  A version change typically  indicates something potentially more disruptive to a consumer than an update file.  And, yes, there is some ambiguity in that distinction – but in practice it’s been a useful distinction in our environment.

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
David Browning
Platform Technology Architect

Thomson Reuters

Phone: +41(058) 3065054
Mobile: +41(079) 8126123

david.browning@thomsonreuters.com<mailto:david.browning@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>


From: andrea.perego@ec.europa.eu<mailto:andrea.perego@ec.europa.eu> [mailto:andrea.perego@ec.europa.eu]
Sent: 22 June 2017 14:15
To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>
Cc: marvin.frommhold@eccenca.com<mailto:marvin.frommhold@eccenca.com>; martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>; makx@makxdekkers.com<mailto:makx@makxdekkers.com>; Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot>; kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net>
Subject: RE: Question for DCAT "experts"

I think it may be worth considering whether the scenario outlined by Karen ("incremental updates"?) relates also to the notion of dcat:Distribution. In particular, for the mechanism used to for synchronising data version, I see some relationship with the use cases concerning the modelling of service-based data access.

Andrea

----
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Directorate B - Growth and Innovation
Unit B6 - Digital Economy
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/<https://urldefense.proofpoint.com/v2/url?u=https-3A__ec.europa.eu_jrc_&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=VFZ5l54iFiCMEum8wIpiZSpB2JHnIMQ7MRrSEdMlAiw&e=>

----
The views expressed are purely those of the writer and may
not in any circumstances be regarded as stating an official
position of the European Commission.

From: Martin Brümmer [mailto:martin.bruemmer@eccenca.com]
Sent: Thursday, June 22, 2017 11:35 AM
To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>
Cc: Marvin Frommhold
Subject: Re: Question for DCAT "experts"


Hi there,

a colleague of mine, Marvin Frommhold, is researching versioning in the context of RDF and Linked Data. He contributes the following points:

The following two documents provide a basic introduction to versioning of datasets:

  *   Papakonstantinou, Vassilis et al. “Versioning for Linked Data: Archiving Systems and Benchmarks.” BLINK@ ISWC. users.ics.forth.gr, 2016. Web.<https://urldefense.proofpoint.com/v2/url?u=http-3A__ceur-2Dws.org_Vol-2D1700_paper-2D05.pdf&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=-ToimaJTj9bR0AuWNhNZ00_s2nfj0f0YTogpBj-wxdc&e=>

     *   Section 2 of this paper provides an introduction of different archiving strategies.

  *   Gray, Alasdair J. G. et al. “Dataset Descriptions: HCLS Community Profile.” Interest group note, W3C (May 2015) http://www.w3.org/TR/hcls-dataset (2015): n. pag. Print.<https://www.w3.org/TR/hcls-dataset/>

     *   A W3C Interest Group Note that, among other things, discusses requirements for dataset versioning.
     *   "The Data Catalog Vocabulary (DCAT) [DCAT<https://www.w3.org/TR/hcls-dataset/#DCAT>] is used to describe datasets in catalogs, but does not deal with the issue of dataset evolution and versioning."
He agrees that change sets are related to versioning in that a version can be described as a set of changes. Fully realized, this allows very granular tracking of dataset evolution. Makx point is important here: These changes are granular descriptions about the evolving content of a dataset, where DCAT so far does little to describe the data itself. If DCAT started to describe the content and structure of the data, this would be a considerable expansion of its scope.

The question if a set of changes constitute a new dataset or if a whole database is a dataset is complicated to me, because I understand instances of dcat:Dataset as conceptual descriptions of datasets, largely independent of the structure of the underlying data. In that sense, a database or a web service independent of the query can also be datasets. Limiting the data retrieved from it by some API call or SQL query could then create a new dataset fully contained in the first one.

cheers,
Martin
Am 22/06/17 um 11:00 schrieb Makx Dekkers:

Yes, I agree it is. Updating 'in place' is a case where the publisher decides that a change does not create a new Dataset.



I find Karen's suggestion to treat a 'database' as a 'dataset' interesting -- I have always thought of a database as closer to a dcat:Catalog.



Makx.





-----Original Message-----

From: Peter.Winstanley@gov.scot<mailto:Peter.Winstanley@gov.scot> [mailto:Peter.Winstanley@gov.scot]

Sent: 22 June 2017 10:52

To: mail@makxdekkers.com<mailto:mail@makxdekkers.com>; public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>

Subject: RE: Question for DCAT "experts"



isn't a change set (like a diff) just a special case of versioning?



-----Original Message-----

From: Makx Dekkers [mailto:mail@makxdekkers.com]

Sent: 22 June 2017 09:47

To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>

Subject: RE: Question for DCAT "experts"



As far as I remember from the initial work on DCAT, a Dataset is considered to be a kind of blob. Nothing is said about what goes on 'inside' a Dataset. The only thing you see on the outside is the modification date but you don't know what has changed inside.

Makx



-----Original Message-----

From: Karen Coyle [mailto:kcoyle@kcoyle.net]

Sent: 21 June 2017 17:31

To: public-dxwg-wg@w3.org<mailto:public-dxwg-wg@w3.org>

Subject: Question for DCAT "experts"



Many of you know DCAT quite well, and I'm new to it, so I'm taking the lazy way and directing this as a question to you.



I see in DCAT that there are properties that define frequency and update dates. The update date is



"Most recent date on which the dataset was changed, updated or modified."



The library world has a number of databases that are updated "in place".

For anyone receiving updates, the updates do not include the entire file, only those records added, changed, or deleted since some set time.



Is this covered by DCAT? If not, I will add a use case and we can discuss.



Thanks,

kc

--

Karen Coyle

kcoyle@kcoyle.net<mailto:kcoyle@kcoyle.net> http://kcoyle.net<https://urldefense.proofpoint.com/v2/url?u=http-3A__kcoyle.net&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=PEF3YuzzKpCupmPY7NjFyFh0zf3uaWiV484O7rPeRbs&e=>

m: 1-510-435-8234 (Signal)

skype: kcoylenet/+1-510-984-3600





______________________________________________________________________

This email has been scanned by the Symantec Email Security.cloud service.

For more information please visit http://www.symanteccloud.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.symanteccloud.com&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=HNtxqmdvLjHJ6tDtUN0D71McOaqYoN3CtITnvzqcPmg&s=RS-ryPIAJX-DdeLAjoV6_iQv-7ExPAEAv3dX7hqb1Y0&e=> ______________________________________________________________________



*********************************** ******************************** This email has been received from an external party and has been swept for the presence of computer viruses.

********************************************************************



**********************************************************************

This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.



Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.





Tha am post-d seo (agus faidhle neo ceanglan  còmhla ris) dhan neach neo luchd-ainmichte a-mhàin. Chan eil e ceadaichte a chleachdadh ann an dòigh sam bith, a’ toirt a-steach còraichean, foillseachadh neo sgaoileadh,  gun chead. Ma ’s e is gun d’fhuair sibh seo le gun fhiosd’, bu choir cur às dhan phost-d agus lethbhreac sam bith air an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun dàil.



Dh’fhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba air a chlàradh neo air a sgrùdadh airson dearbhadh gu bheil an siostam ag obair gu h-èifeachdach neo airson adhbhar laghail eile. Dh’fhaodadh nach  eil beachdan anns a’ phost-d seo co-ionann ri beachdan Riaghaltas na h-Alba.

**********************************************************************










--

Martin Brümmer

Linked Data Consultat



phone +49 341 26508028

martin.bruemmer@eccenca.com<mailto:martin.bruemmer@eccenca.com>



Postanschrift / Postal address:

eccenca GmbH | Hainstraße 8 | 04109  Leipzig | Germany



eccenca GmbH

Hainstraße 8 | 04109  Leipzig | Germany

Geschäftsführer / Board of Directors: Hans-Chr. Brockmann

Sitz und Registergericht / Domicile and Court of Registry: Leipzig

HRB-Nr. / Commercial Register No.: 29201

USt-ID / VAT registration No.: DE 289172708



Diese Mail kann vertrauliche Informationen enthalten. Wenn Sie nicht Adressat sind, sind Sie nicht zur Verwendung der in dieser Mail enthaltenen Informationen befugt. Bitte benachrichtigen Sie uns sofort über den irrtümlichen Empfang.

This e-mail may contain confidential information. If you are not the addressee you are not authorized to make use of the information contained in this e-mail. Please inform us immediately that you have received it by mistake.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.symanteccloud.com&d=DwMGaQ&c=4ZIZThykDLcoWk-GVjSLmy8-1Cr1I4FWIvbLFebwKgY&r=SX6sxEGBIuiEtjQTAWz7jTpuOC0f5DcH79errOWxM8RN6gOsHdAxWfl9GTTkalJj&m=fRo50yJizVSUHfDi7E757jy8R7i7W6Y-hkkN2NtudUY&s=-b8B4WeCO4GlJStxswVYUKf3_1lhDWt7WKWzgvy1WAE&e=>
______________________________________________________________________

*********************************** ********************************

This email has been received from an external party and

has been swept for the presence of computer viruses.

********************************************************************

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com

______________________________________________________________________

*********************************** ********************************

This email has been received from an external party and

has been swept for the presence of computer viruses.

********************************************************************
Received on Thursday, 22 June 2017 14:04:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:41:56 UTC