Who is the Dataset? from Laufer on 2015-04-18 (public-dwbp-wg@w3.org from April 2015)

From: Laufer <laufer@globo.com>
Date: Sat, 18 Apr 2015 12:30:08 -0300
To: DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJii7rZwtH3T7OPRF0va8ky8XOg+hkHnOn-LwcTSy0JAxgw@mail.gmail.com>
Hi Eric, Hi all.



First of all, I would like to apologize for this long text. But the
discussion about the usage of the Dataset is pretty much interesting and I
feel that is one of the central things that could contribute for this new
world of data.



As I said in a previous message, I like to think in Data as a kind of
product and, as one, demands a lot of things to help the consumers to find
and use it (reuse is also a use). One of the things that is very important
to the producer of a good is to provide communication channels to foster
the interaction with consumers.



Producers need feedback to better understand what is wrong and what is
right with their products, what are the expectations, new desired features,
etc. It is a kind of outside view: the real use versus the expected use. If
it was a person, maybe it would be a kind of therapy where someone could
perceive how others see you, in a way that you could adjust certain
features or explain better other things. Who is me?



Besides the official communication channels, eventually provided by the
producers, nowadays people have a lot of other informal channels where they
talk about so many things, including the products. And these talks are each
day more valuable. Tools like twitter, for example, has a huge value
because there are a lot of opinions, visions, etc, about a myriad of
subjects that can provide information about these subjects (and these
people, of course).



Back to Dataset, a publisher provide a bunch of data and metadata that she
imagines could clarify what is the Dataset being published, and, sometimes,
what are the possible usages of this Dataset. Collecting feedback could
provide information to the publishers in a way that the quality of the
Dataset could (maybe) be enhanced, or new Datasets could be published, or
the quality of metadata could be enhanced, or new metadata could be
published.



The initial DUV diagram has a Feedback class that has some specializations.
One specialization that is being discussed is Citation. In some sense, all
the specializations must have some kind of reference to the Dataset, in a
way to connect the Opinion, Rating, etc., to the Dataset. How this link is
done?



If it is an official communication channel provided by the publisher, this
is automatic. But if it is not? Citation is one of them. Using the Dataset
identifier (?) is other. Or a not so directed link, but some link that
could be extracted from a more informal reference.



I am not saying that it is an easy thing to get the feedback provided by
unofficial communication channels. But this is not easy for products in
general, and tools for this task appear when the importance of feedback is
perceived by the market. It is not in the scope of the group to provide
these tools.



What is a citation? It is a link that one work makes with a previous work.
Why? “Why” is the feedback. Could be, as Annette pointed, a way to trust
this new work. Even in this case, the new work has something that, in some
sense, could extend the previous one. Citation, in general applies to
frozen works. But our Datasets could evolve. Besides the “trust” use, the
citation could be used in a work that makes a comparison between previous
works and could have some criticisms about these works. It is common to
have in works like thesis a section called “Related Works”, where the
author analyses these works to show the contributions of his own work.
Summarizing, I don’t think that citation is the (complete) feedback. It is
a feedback in numeric terms, as it is an important fact that some work has
a big number of citations. But citation is the link between works. The
“why” is an important feedback. Again, I don’t think that it is an easy
task to extract this information.



I see  DUV as a way to give semantics to describe all these data generated
by the Dataset. To describe the (possible) universe that is created from an
origin Dataset. In rough terms, a kind of reverse provenance.



I have other issues in my mind but I think that they will be discussed
along the development of DUV.



Sorry again for this long (digressing) text.



Cheers,

Laufer




-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Saturday, 18 April 2015 15:30:36 UTC