Re: Who is the Dataset? from Eric Stephan on 2015-04-18 (public-dwbp-wg@w3.org from April 2015)

From: Eric Stephan <ericphb@gmail.com>
Date: Sat, 18 Apr 2015 10:23:09 -0700
To: Laufer <laufer@globo.com>
Cc: DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CAMFz4jjmu+QHPOHW+o6pKVpY1FBS+iHXiN1Ppb8TPyKg_4-J7g@mail.gmail.com>
Hi Laufer,

Once again you have given me pause to think of many things we haven't yet
discussed.   I've added my "feedback" to your thoughts :-)

Eric S.

On Sat, Apr 18, 2015 at 8:30 AM, Laufer <laufer@globo.com> wrote:

> Hi Eric, Hi all.
>
>
>
> First of all, I would like to apologize for this long text. But the
> discussion about the usage of the Dataset is pretty much interesting and I
> feel that is one of the central things that could contribute for this new
> world of data.
>
>
>
> As I said in a previous message, I like to think in Data as a kind of
> product and, as one, demands a lot of things to help the consumers to find
> and use it (reuse is also a use). One of the things that is very important
> to the producer of a good is to provide communication channels to foster
> the interaction with consumers.
>
>
+1 if you are referring to Dataset rather than Data. I do think of Dataset
the same way as a product from a usage perspective.  Can we start a group
Bitcoin account with the requirement of paying into the account if we
incorrectly refer to a Dataset as Data?  We could have a drawing at the end
of the Working Group winner takes all.   :-) :-)


>
>
> Producers need feedback to better understand what is wrong and what is
> right with their products, what are the expectations, new desired features,
> etc. It is a kind of outside view: the real use versus the expected use. If
> it was a person, maybe it would be a kind of therapy where someone could
> perceive how others see you, in a way that you could adjust certain
> features or explain better other things. Who is me?
>

+1 I really like the http://rdfs.org/sioc/spec/ User/Roles classes I wonder
if they might help give context in part to expertise.  I also really like
the meta-model in the http://stackoverflow.com/help/stackexchange that
helps track the behavior of users, from usage metadata other users have a
real perspective on expertise based on the answers given, questions asked,
and the communities value they place in the "expert".

I'd like to hear more ideas from you on this, this really needs to evolve
in the vocab.

>
>
> Besides the official communication channels, eventually provided by the
> producers, nowadays people have a lot of other informal channels where they
> talk about so many things, including the products. And these talks are each
> day more valuable. Tools like twitter, for example, has a huge value
> because there are a lot of opinions, visions, etc, about a myriad of
> subjects that can provide information about these subjects (and these
> people, of course).
>
>
>

+1 It would really be nice to capture these types of models.  Would be nice
to find surveys that have been published, or perhaps this is an opportunity
for us to publish on the subject?


> Back to Dataset, a publisher provide a bunch of data and metadata that she
> imagines could clarify what is the Dataset being published, and, sometimes,
> what are the possible usages of this Dataset. Collecting feedback could
> provide information to the publishers in a way that the quality of the
> Dataset could (maybe) be enhanced, or new Datasets could be published, or
> the quality of metadata could be enhanced, or new metadata could be
> published.
>
>
>
> The initial DUV diagram has a Feedback class that has some
> specializations. One specialization that is being discussed is Citation. In
> some sense, all the specializations must have some kind of reference to the
> Dataset, in a way to connect the Opinion, Rating, etc., to the Dataset. How
> this link is done?
>
>
>
> If it is an official communication channel provided by the publisher, this
> is automatic. But if it is not? Citation is one of them. Using the Dataset
> identifier (?) is other. Or a not so directed link, but some link that
> could be extracted from a more informal reference.
>
>
>
> I am not saying that it is an easy thing to get the feedback provided by
> unofficial communication channels. But this is not easy for products in
> general, and tools for this task appear when the importance of feedback is
> perceived by the market. It is not in the scope of the group to provide
> these tools.
>
>
+1 I think you are addressing issues we haven't yet considered, most
thought has been how do I represent usage, but you addressing search,
discovery, and access. Sumit is currently looking at mechanisms for using
the vocabulary, Also there might be connectivity for machine learning for
finding trends in relevant feedback.


>
>
> What is a citation? It is a link that one work makes with a previous work.
> Why? “Why” is the feedback. Could be, as Annette pointed, a way to trust
> this new work. Even in this case, the new work has something that, in some
> sense, could extend the previous one. Citation, in general applies to
> frozen works. But our Datasets could evolve. Besides the “trust” use, the
> citation could be used in a work that makes a comparison between previous
> works and could have some criticisms about these works. It is common to
> have in works like thesis a section called “Related Works”, where the
> author analyses these works to show the contributions of his own work.
> Summarizing, I don’t think that citation is the (complete) feedback. It is
> a feedback in numeric terms, as it is an important fact that some work has
> a big number of citations. But citation is the link between works. The
> “why” is an important feedback. Again, I don’t think that it is an easy
> task to extract this information.
>
>
>
Phil pointed us to the HCLS Community Profile and that referenced the
Citation Typing Ontology (CiTO).   I've just begun looking at it this
weekend. It may cover many aspects of what you asking.


I see  DUV as a way to give semantics to describe all these data generated
> by the Dataset. To describe the (possible) universe that is created from an
> origin Dataset. In rough terms, a kind of reverse provenance.
>
>
>
+1


> I have other issues in my mind but I think that they will be discussed
> along the development of DUV.
>
>
>
> Sorry again for this long (digressing) text.
>
>
>
Thank you Bluebro, all good stuff here!


> Cheers,
>
> Laufer
>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
Received on Saturday, 18 April 2015 17:23:36 UTC