Re: Enrichment document from yaso@nic.br on 2015-06-24 (public-dwbp-wg@w3.org from June 2015)

From: <yaso@nic.br>
Date: Wed, 24 Jun 2015 11:08:48 -0300
To: public-dwbp-wg@w3.org
Message-ID: <558AB9F0.501@nic.br>
Hi everyone,

I agree with both Annete and Antoine, but still think it is an important
issue to be discussed by the group. I understand your fears on turning
back to our scope endless discussion, but there are specific points,
like the one that Antoine raised on enriching linked data, for example,
that are very useful for us considering the scenario of the web nowadays.  ]

Keep in mind that our scope is already well defined:

"This document is concerned solely with best practices that:

1. are specifically relevant to data published on the Web;
2. encourage publication or re-use of data on the Web;
3. can be tested by machines, humans or a combination of the two.

As noted above, whether a best practice has or has not been followed
should be judged against the intended outcome, not the possible approach
to implementation which is offered as guidance.

A best practice is always subject to improvement as we learn and evolve
the Web together."

Maybe this note on data enrichment note [1] can turn in to a use case. I
think that reviewing it with an eye for the challenges that we might
raise from the InWeb work can be a good idea, since we already went thru
this process for the other Best Practices.



yaso



[1] https://w3c.github.io/dwbp/enrichment.html







On 06/23/2015 05:17 PM, Antoine Isaac wrote:
> Hi,
> 
> I fully support Annette's point about the fear of including
> recommendations about everything, even when not really specific to the web.
> 
> As far as the content of the document is concerned, I must confess I've
> never looked at it. And even though I've missed a couple of calls
> lately, I don't remember any formal request for review has been ever
> made...
> 
> It's a pity, because the document may contain some very good stuff. But
> it may also be very shaky of others. For instance, I have the feeling it
> ignores many things done for enriching linked data. And work on
> evaluating the results - actually it's confusing to find that a fairly
> long document on data enrichment would only have three occurrence of
> 'quality' in it. Probably it will be good to discuss this also in the
> coming calls.
> 
> Finally, there are quite big typos, even in the header. For example,
> "Desirible".
> 
> Best,
> 
> Antoine
> 
> On 6/23/15 8:52 PM, Annette Greiner wrote:
>> Hi Steve,
>> I think you're right that "big data" gets used to mean just plain
>> data. If the distinction between the meaning of "big data" and "data"
>> is becoming an academic one, isn't that even more reason to avoid
>> trying to make the distinction in our own work? Let's just call data
>> data. If we want to talk about the Vs, we can use the V words.  I
>> actually work in academia, in the data science program at Berkeley,
>> and the consensus even there about the term is that it is not very
>> helpful. It is perceived as shallow and attention-seeking.
>>
>> Re the scoping issue, you misunderstand me, and looking back at the
>> placement of my last sentence, I can see why. (Sorry.) I don't think
>> that addressing the full meaning of data enrichment would throw this
>> piece out of scope. If anything, it would bring it back in. I think
>> its current failure to address the broader meaning of enrichment is a
>> serious problem, separate from the scoping issue. In addition, I 
>> think that we should always ask ourselves whether what we are writing
>> is relevant in particular to data on the web. I don't think there is
>> anything particularly web-based about machine learning.
>>
>> I worry that we are slowly trying to write something about every
>> aspect of the data lifecycle. It's difficult enough for me to accept
>> the extra BPs about how to create a vocabulary, and I worry about the
>> data preservation BPs on similar grounds. Machine learning strikes me
>> as further afield than either of those. Should we also write notes
>> about hadoop, database administration, data visualization, and survey
>> design? If we define our scope this broadly, what would we rule out?
>> -Annette
>>
>> On Jun 23, 2015, at 10:32 AM, Steven Adler <adler1@us.ibm.com> wrote:
>>
>>> Annette,
>>>
>>> At first I agreed but then I have to say that I don't...  because
>>> "Big Data" is over-used and somewhat amorphous it is becoming a term
>>> used by everyone for much of what we might also narrowly define as
>>> "just Data."  ie, the distinction is increasingly academic.
>>>
>>> Also, I think we did discuss in the past that unstructured text,
>>> image, audio, and other multi-media types is also data on the web
>>> that is published in open formats.
>>>
>>> So really, I don't see the harm in the inclusion on the basis of
>>> those objections because I hope that additional data types are not
>>> tangential to our standards.
>>>
>>> Best Regards,
>>>
>>> Steve
>>>
>>> Motto: "Do First, Think, Do it Again"
>>>
>>> <graycol.gif>Annette Greiner ---06/23/2015 12:30:52 PM---Hm, I had
>>> never seen that enrichment document and didn't even realize it was in
>>> development. It give
>>>
>>> <ecblank.gif>
>>> From:
>>> <ecblank.gif>
>>> Annette Greiner <amgreiner@lbl.gov>
>>> <ecblank.gif>
>>> To:
>>> <ecblank.gif>
>>> Phil Archer <phila@w3.org>
>>> <ecblank.gif>
>>> Cc:
>>> <ecblank.gif>
>>> Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio
>>> <bfl@cin.ufpe.br>, Caroline Burle <cburle@nic.br>, Newton Calegari
>>> <newton@nic.br>, "glpappa@dcc.ufmg.br" <glpappa@dcc.ufmg.br>
>>> <ecblank.gif>
>>> Date:
>>> <ecblank.gif>
>>> 06/23/2015 12:30 PM
>>> <ecblank.gif>
>>> Subject:
>>> <ecblank.gif>
>>> Re: Enrichment document
>>>
>>>
>>>
>>> Hm, I had never seen that enrichment document and didn't even realize
>>> it was in development. It gives a nice review of machine learning
>>> techniques with a focus on text analysis. Very interesting stuff, but
>>> I have a few concerns. My primary concern is that it defines data
>>> enrichment much too narrowly. Data enrichment is helpful for all
>>> kinds of data, not just "big data" (a term I would encourage us to
>>> avoid, as it is overused and highly ambiguous). It is useful in image
>>> data as well as text, and in structured as well as unstructured data.
>>> I think we need to beware of putting out content that is tangential
>>> to the subject of publishing data on the web.
>>> -Annette
>>>
>>> Sent from a keyboard-challenged device
>>>
>>>> On Jun 23, 2015, at 7:00 AM, Phil Archer <phila@w3.org> wrote:
>>>>
>>>> I'm putting the DWBP doc through pubrules and, forgive me, I've just
>>>> noticed that it links to the enrichment document.
>>>>
>>>> For those unfamiliar with this, see
>>>> http://w3c.github.io/dwbp/enrichment.html
>>>>
>>>> The WG may well decide to publish this - it certainly deserves
>>>> attention and may well be published. However, we can't just include
>>>> it as a separate Note without going through the usual process
>>>> followed by other documents in the WG.
>>>>
>>>> For this week's publication I have therefore removed "... according
>>>> to the suggestions described in Data Enrichment Technical Note" from
>>>> the BP doc and the link to the enrichment doc.
>>>>
>>>> Let's put this on the agenda for a near future call.
>>>>
>>>> Phil.
>>>>
>>>> -- 
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>>>>
>>>
>>>
>>>
>>
>>
>>
>
Received on Wednesday, 24 June 2015 14:09:28 UTC