- From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
- Date: Fri, 26 Jun 2015 11:53:10 -0300
- To: Laufer <laufer@globo.com>
- Cc: Makx Dekkers <mail@makxdekkers.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
- Message-ID: <CANx1PzzT18BkVgAKPpZEDY=Z2p9Ch9BYs_r0kwy2wQ40Gn8GDQ@mail.gmail.com>
Hi Laufer, Thanks for the message! My comments are inline: > > I don't think that a Dataset is an abstract thing. But I agree that > distributions of a Dataset (DCAT definition) are instances of the same > Dataset source. And this is one of the possible relations between Datasets. > When I said abstract thing is to mean that it is just a concept or a type of framework to create dataset instances, i.e. it is similar to the concept of class in OO paradigm. > To publish is to establish a way of accessing data (strong statement... > ?). When we buy a book, maybe the source of that book is in a .doc that has > to be printed in paper and distributed. The source would be the .doc and > the distribution the printed document. Or, maybe, we could also have a > distribution in pdf. > In my opinion, .doc and the printed versions are two different distributions of the same instance. Considering that there will be a class Book, then there will be an object (a specific book) that will have two distributions: one is the file and the other one is printed version. > > Dataset versions (in the sense of software versions) have different > sources that could have, each one, different distributions. All these > instances are related, but with different relations. Time series generate > different sources that could have sets of distributions too. > I'm sorry, but I think I dont understand the concept of "source". > > Some data consumers (publishers?) could confuse versions with > distributions. And also time series. The term version could be used by some > persons to define the relation between different distributions, saying that > a csv and a xml files are different versions of the same Dataset. This is > not the same of different sources. > I agree! buI think its possible to make this distinction more clear. > > Maybe what we have to include in the BP document is the idea of relations > between Dataset sources and between Dataset instances. > Could you explain what you mean by a dataset source? Cheers, Berna > > Cheers, > Laufer > > > > 2015-06-26 8:47 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>: > >> Hi Makx and Annette, >> >> Thanks for your messages and the great explanation! >> >> Concerning Makx question, it is also not clear for me if distributions >> contain the same data point. Initially, I thought that DCAT would allow >> distributions that are just similar in nature but with different data >> points. However, a dataset has properties dct:temporal and dct:spatial that >> maybe will restrict the data. Then, in this case, considering the annual >> budget example, different datasets will be created for different years, but >> the distributions will contain the same data from the spatial and temporal >> perspective. >> >> IMO there is something missing between a dataset and its distributions. >> In the DWBP document, I used the notion of version, however after the >> discussions with the group I think version is not the right concept. Maybe >> something like "dataset instance" is more suitable. If we consider a >> dataset as an abstract concept (I think it should be), then instances of a >> dataset may be created according to different spatial and temporal >> granularities. In the budget example, there will be a dataset, called >> annual budget, and then there will be one instance of the dataset for each >> year. When necessary, an instance may have a current version (the instance >> itself) and one or more previous versions, where a version will represent >> the state of the instance at a given moment. In this case, an instance will >> have one or more distributions that should differ just in format or access >> method/endpoint. >> >> Please, let me know if this idea makes sense for you. >> >> Thanks! >> Bernadette >> >> >> >> 2015-06-22 6:31 GMT-03:00 Makx Dekkers <mail@makxdekkers.com>: >> >>> Maybe to summarise a main question in my message of last Friday: >>> >>> Does DCAT (a) imply that all Distributions of a Dataset contain the* >>> same* data points and only differ in format or access method/end point, >>> or does it (b) allow Distributions of a Dataset to contain data that is* >>> similar* in nature (such as annual budgets for different years)? >>> >>> This was the main question a group that I am involved in was not able to >>> answer. >>> >>> Makx. >>> >>> >>> _____________________________________________ >>> *From:* Makx Dekkers [mailto:mail@makxdekkers.com >>> <mail@makxdekkers.com>] >>> *Sent:* 19 June 2015 11:41 >>> *To:* 'Data on the Web Best Practices Working Group' >>> *Subject:* RE: reviewing the BP doc >>> >>> Just on the issue of data versioning: >>> >>> > >>> >>> > * Data Versioning >>> >>> > The chart describes time series data, not versions of data. I >>> would say that, if >>> >>> > released independently, the items in yellow each represent a >>> different >>> >>> > dataset (they report different data points), not a different >>> version. If you >>> >>> > revised any of them, then the original and the revision would >>> be different >>> >>> > versions. I think by definition, versions attempt to report the >>> same data. >>> >>> > >>> >>> As I said in last week's call, this is related to the more >>> general issue of relationships between data files. >>> >>> >> >> >> -- >> Bernadette Farias Lóscio >> Centro de Informática >> Universidade Federal de Pernambuco - UFPE, Brazil >> >> ---------------------------------------------------------------------------- >> > > > > -- > . . . .. . . > . . . .. > . .. . > -- Bernadette Farias Lóscio Centro de Informática Universidade Federal de Pernambuco - UFPE, Brazil ----------------------------------------------------------------------------
Received on Friday, 26 June 2015 14:53:58 UTC