Re: reviewing the BP doc from Bernadette Farias Lóscio on 2015-06-26 (public-dwbp-wg@w3.org from June 2015)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Fri, 26 Jun 2015 11:53:10 -0300
To: Laufer <laufer@globo.com>
Cc: Makx Dekkers <mail@makxdekkers.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Message-ID: <CANx1PzzT18BkVgAKPpZEDY=Z2p9Ch9BYs_r0kwy2wQ40Gn8GDQ@mail.gmail.com>
Hi Laufer,

Thanks for the message! My comments are inline:

>
> I don't think that a Dataset is an abstract thing. But I agree that
> distributions of a Dataset (DCAT definition) are instances of the same
> Dataset source.  And this is one of the possible relations between Datasets.
>

When I said abstract thing is to mean that it is just a concept or a type
of framework to create dataset instances, i.e. it is similar to the concept
of class in OO paradigm.


> To publish is to establish a way of accessing data (strong statement...
> ?). When we buy a book, maybe the source of that book is in a .doc that has
> to be printed in paper and distributed. The source would be the .doc and
> the distribution the printed document. Or, maybe, we could also have a
> distribution in pdf.
>

In my opinion, .doc and the printed versions are two different
distributions of the same instance. Considering that there will be a class
Book, then there will be an object (a specific book)  that will have two
distributions: one is the file and the other one is printed version.


>
> Dataset versions (in the sense of software versions) have different
> sources that could have, each one, different distributions. All these
> instances are related, but with different relations. Time series generate
> different sources that could have sets of distributions too.
>

I'm sorry, but I think I dont understand the concept of "source".


>
> Some data consumers (publishers?) could confuse versions with
> distributions. And also time series. The term version could be used by some
> persons to define the relation between different distributions, saying that
> a csv and a xml files are different versions of the same Dataset. This is
> not the same of different sources.
>

I agree! buI think its possible to make this distinction more clear.


>
> Maybe what we have to include in the BP document is the idea of relations
> between Dataset sources and between Dataset instances.
>

Could you explain what you mean by a dataset source?

Cheers,
Berna


>
> Cheers,
> Laufer
>
>
>
> 2015-06-26 8:47 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:
>
>> Hi Makx and Annette,
>>
>> Thanks for your messages and the great explanation!
>>
>> Concerning Makx question, it is also not clear for me if distributions
>> contain the same data point. Initially, I thought that DCAT would allow
>> distributions that are just similar in nature but with different data
>> points. However, a dataset has properties dct:temporal and dct:spatial that
>> maybe will restrict the data. Then, in this case, considering the annual
>> budget example, different datasets will be created for different years, but
>> the distributions will contain the same data from the spatial and temporal
>> perspective.
>>
>> IMO there is something missing between a dataset and its distributions.
>> In the DWBP document, I used the notion of version, however after the
>> discussions with the group I think version is not the right concept. Maybe
>> something like "dataset instance" is more suitable. If we consider a
>> dataset as an abstract concept (I think it should be), then instances of a
>> dataset may be created according to different spatial and temporal
>> granularities. In the budget example, there will be a dataset, called
>> annual budget, and then there will be one instance of the dataset for each
>> year. When necessary, an instance may have a current version (the instance
>> itself) and one or more previous versions, where a version will represent
>> the state of the instance at a given moment. In this case, an instance will
>> have one or more distributions that should differ just in format or access
>> method/endpoint.
>>
>> Please, let me know if this idea makes sense for you.
>>
>> Thanks!
>> Bernadette
>>
>>
>>
>> 2015-06-22 6:31 GMT-03:00 Makx Dekkers <mail@makxdekkers.com>:
>>
>>>  Maybe to summarise a main question in my message of last Friday:
>>>
>>> Does DCAT (a) imply that all Distributions of a Dataset contain the*
>>> same* data points and only differ in format or access method/end point,
>>> or does it (b) allow Distributions of a Dataset to contain data that is*
>>> similar* in nature (such as annual budgets for different years)?
>>>
>>> This was the main question a group that I am involved in was not able to
>>> answer.
>>>
>>> Makx.
>>>
>>>
>>>    _____________________________________________
>>>       *From:* Makx Dekkers [mailto:mail@makxdekkers.com
>>>       <mail@makxdekkers.com>]
>>>       *Sent:* 19 June 2015 11:41
>>>       *To:* 'Data on the Web Best Practices Working Group'
>>>       *Subject:* RE: reviewing the BP doc
>>>
>>>       Just on the issue of data versioning:
>>>
>>>       >
>>>
>>>       > * Data Versioning
>>>
>>>       > The chart describes time series data, not versions of data. I
>>>       would say that, if
>>>
>>>       > released independently, the items in yellow each represent a
>>>       different
>>>
>>>       > dataset (they report different data points), not a different
>>>       version. If you
>>>
>>>       > revised any of them, then the original and the revision would
>>>       be different
>>>
>>>       > versions. I think by definition, versions attempt to report the
>>>       same data.
>>>
>>>       >
>>>
>>>       As I said in last week's call, this is related to the more
>>>       general issue of relationships between data files.
>>>
>>>
>>
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>>
>> ----------------------------------------------------------------------------
>>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>



-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Friday, 26 June 2015 14:53:58 UTC