Re: reviewing the BP doc from Laufer on 2015-06-26 (public-dwbp-wg@w3.org from June 2015)

From: Laufer <laufer@globo.com>
Date: Fri, 26 Jun 2015 11:34:21 -0300
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Cc: Makx Dekkers <mail@makxdekkers.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJigM4Wb_bd-KYxSQ3qsCG7ABodj0s+=9Qc_ToKruYATUZw@mail.gmail.com>
Hi, Bernadette,

I don't think that a Dataset is an abstract thing. But I agree that
distributions of a Dataset (DCAT definition) are instances of the same
Dataset source.  And this is one of the possible relations between Datasets.

To publish is to establish a way of accessing data (strong statement... ?).
When we buy a book, maybe the source of that book is in a .doc that has to
be printed in paper and distributed. The source would be the .doc and the
distribution the printed document. Or, maybe, we could also have a
distribution in pdf.

Dataset versions (in the sense of software versions) have different sources
that could have, each one, different distributions. All these instances are
related, but with different relations. Time series generate different
sources that could have sets of distributions too.

Some data consumers (publishers?) could confuse versions with
distributions. And also time series. The term version could be used by some
persons to define the relation between different distributions, saying that
a csv and a xml files are different versions of the same Dataset. This is
not the same of different sources.

Maybe what we have to include in the BP document is the idea of relations
between Dataset sources and between Dataset instances.

Cheers,
Laufer



2015-06-26 8:47 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:

> Hi Makx and Annette,
>
> Thanks for your messages and the great explanation!
>
> Concerning Makx question, it is also not clear for me if distributions
> contain the same data point. Initially, I thought that DCAT would allow
> distributions that are just similar in nature but with different data
> points. However, a dataset has properties dct:temporal and dct:spatial that
> maybe will restrict the data. Then, in this case, considering the annual
> budget example, different datasets will be created for different years, but
> the distributions will contain the same data from the spatial and temporal
> perspective.
>
> IMO there is something missing between a dataset and its distributions. In
> the DWBP document, I used the notion of version, however after the
> discussions with the group I think version is not the right concept. Maybe
> something like "dataset instance" is more suitable. If we consider a
> dataset as an abstract concept (I think it should be), then instances of a
> dataset may be created according to different spatial and temporal
> granularities. In the budget example, there will be a dataset, called
> annual budget, and then there will be one instance of the dataset for each
> year. When necessary, an instance may have a current version (the instance
> itself) and one or more previous versions, where a version will represent
> the state of the instance at a given moment. In this case, an instance will
> have one or more distributions that should differ just in format or access
> method/endpoint.
>
> Please, let me know if this idea makes sense for you.
>
> Thanks!
> Bernadette
>
>
>
> 2015-06-22 6:31 GMT-03:00 Makx Dekkers <mail@makxdekkers.com>:
>
>>  Maybe to summarise a main question in my message of last Friday:
>>
>> Does DCAT (a) imply that all Distributions of a Dataset contain the*
>> same* data points and only differ in format or access method/end point,
>> or does it (b) allow Distributions of a Dataset to contain data that is*
>> similar* in nature (such as annual budgets for different years)?
>>
>> This was the main question a group that I am involved in was not able to
>> answer.
>>
>> Makx.
>>
>>
>>    _____________________________________________
>>       *From:* Makx Dekkers [mailto:mail@makxdekkers.com
>>       <mail@makxdekkers.com>]
>>       *Sent:* 19 June 2015 11:41
>>       *To:* 'Data on the Web Best Practices Working Group'
>>       *Subject:* RE: reviewing the BP doc
>>
>>       Just on the issue of data versioning:
>>
>>       >
>>
>>       > * Data Versioning
>>
>>       > The chart describes time series data, not versions of data. I
>>       would say that, if
>>
>>       > released independently, the items in yellow each represent a
>>       different
>>
>>       > dataset (they report different data points), not a different
>>       version. If you
>>
>>       > revised any of them, then the original and the revision would be
>>       different
>>
>>       > versions. I think by definition, versions attempt to report the
>>       same data.
>>
>>       >
>>
>>       As I said in last week's call, this is related to the more general
>>       issue of relationships between data files.
>>
>>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>



-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Friday, 26 June 2015 14:34:52 UTC