W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > June 2015

Re: Enrichment document

From: Caroline Burle <cburle@nic.br>
Date: Thu, 25 Jun 2015 10:07:26 -0300
Message-ID: <558BFD0E.1020109@nic.br>
To: public-dwbp-wg@w3.org
Dear all,

+1 to InWeb. We agree that data enrichment is a important issue to be 
discussed and the document written should give support to the Data 
Enrichment BP. On the other hand, we still have to decide as a group 
what is the best way (and where) to include this content.

Kind regards,
Bernadette, Caroline and Newton

On 24/06/15 18:58, Wagner Meira Jr. wrote:
> Hi all,
>
> We, from InWeb, agree with the various observations and believe that
> refining it is the way to achieve something relevant wrt data enrichment
> and DWBP.
>
> First of all, it was never our intention to disrupt the usual process for
> generating and publishing W3C documents. We presented a very first
> version of the document in the last F2F meeting and since then
> have been extending it. We were not sure about the post presentation
> process and are sorry for the confusion.
>
> Second, we agree that it may not be a good idea to use "big data",
> even as a background motivation for the document, as it is. The
> practices reported are applicable to any kind and volume of data.
>
> Third, extending the discussion beyond textual data, although
> relevant, may represent a much broader scope. We suggest
> to first focus on textual data and later evaluate whether the practices
> generalize to other types of data.
>
> Fourth, our rationale in building the document was not really to
> provide an exhaustive overview of the methods and techniques for
> data enrichment, but what criteria any such method should satisfy.
> We understand that this strategy is more compatible with the current
> DWPB draft.
>
> Finally, we have discussed quite a lot about whether data enrichment
> makes sense in the DWBP draft, and so far, the conclusion has been
> that it fits there, but such outcome may change as  we deepen the
> discussion in the group. We suggest that you take the current draft just
> as a suggestion on how to approach data enrichment in the context of
> DBWP.
>
> Thus, how do you guys want to evolve on this issue?
>
> Best,
>
> Wagner
>
> On 24-06-2015 11:08, yaso@nic.br wrote:
>> Hi everyone,
>>
>> I agree with both Annete and Antoine, but still think it is an important
>> issue to be discussed by the group. I understand your fears on turning
>> back to our scope endless discussion, but there are specific points,
>> like the one that Antoine raised on enriching linked data, for example,
>> that are very useful for us considering the scenario of the web 
>> nowadays.  ]
>>
>> Keep in mind that our scope is already well defined:
>>
>> "This document is concerned solely with best practices that:
>>
>> 1. are specifically relevant to data published on the Web;
>> 2. encourage publication or re-use of data on the Web;
>> 3. can be tested by machines, humans or a combination of the two.
>>
>> As noted above, whether a best practice has or has not been followed
>> should be judged against the intended outcome, not the possible approach
>> to implementation which is offered as guidance.
>>
>> A best practice is always subject to improvement as we learn and evolve
>> the Web together."
>>
>> Maybe this note on data enrichment note [1] can turn in to a use case. I
>> think that reviewing it with an eye for the challenges that we might
>> raise from the InWeb work can be a good idea, since we already went thru
>> this process for the other Best Practices.
>>
>>
>>
>> yaso
>>
>>
>>
>> [1] https://w3c.github.io/dwbp/enrichment.html
>>
>>
>>
>>
>>
>>
>>
>> On 06/23/2015 05:17 PM, Antoine Isaac wrote:
>>> Hi,
>>>
>>> I fully support Annette's point about the fear of including
>>> recommendations about everything, even when not really specific to 
>>> the web.
>>>
>>> As far as the content of the document is concerned, I must confess I've
>>> never looked at it. And even though I've missed a couple of calls
>>> lately, I don't remember any formal request for review has been ever
>>> made...
>>>
>>> It's a pity, because the document may contain some very good stuff. But
>>> it may also be very shaky of others. For instance, I have the 
>>> feeling it
>>> ignores many things done for enriching linked data. And work on
>>> evaluating the results - actually it's confusing to find that a fairly
>>> long document on data enrichment would only have three occurrence of
>>> 'quality' in it. Probably it will be good to discuss this also in the
>>> coming calls.
>>>
>>> Finally, there are quite big typos, even in the header. For example,
>>> "Desirible".
>>>
>>> Best,
>>>
>>> Antoine
>>>
>>> On 6/23/15 8:52 PM, Annette Greiner wrote:
>>>> Hi Steve,
>>>> I think you're right that "big data" gets used to mean just plain
>>>> data. If the distinction between the meaning of "big data" and "data"
>>>> is becoming an academic one, isn't that even more reason to avoid
>>>> trying to make the distinction in our own work? Let's just call data
>>>> data. If we want to talk about the Vs, we can use the V words.  I
>>>> actually work in academia, in the data science program at Berkeley,
>>>> and the consensus even there about the term is that it is not very
>>>> helpful. It is perceived as shallow and attention-seeking.
>>>>
>>>> Re the scoping issue, you misunderstand me, and looking back at the
>>>> placement of my last sentence, I can see why. (Sorry.) I don't think
>>>> that addressing the full meaning of data enrichment would throw this
>>>> piece out of scope. If anything, it would bring it back in. I think
>>>> its current failure to address the broader meaning of enrichment is a
>>>> serious problem, separate from the scoping issue. In addition, I
>>>> think that we should always ask ourselves whether what we are writing
>>>> is relevant in particular to data on the web. I don't think there is
>>>> anything particularly web-based about machine learning.
>>>>
>>>> I worry that we are slowly trying to write something about every
>>>> aspect of the data lifecycle. It's difficult enough for me to accept
>>>> the extra BPs about how to create a vocabulary, and I worry about the
>>>> data preservation BPs on similar grounds. Machine learning strikes me
>>>> as further afield than either of those. Should we also write notes
>>>> about hadoop, database administration, data visualization, and survey
>>>> design? If we define our scope this broadly, what would we rule out?
>>>> -Annette
>>>>
>>>> On Jun 23, 2015, at 10:32 AM, Steven Adler <adler1@us.ibm.com> wrote:
>>>>
>>>>> Annette,
>>>>>
>>>>> At first I agreed but then I have to say that I don't... because
>>>>> "Big Data" is over-used and somewhat amorphous it is becoming a term
>>>>> used by everyone for much of what we might also narrowly define as
>>>>> "just Data."  ie, the distinction is increasingly academic.
>>>>>
>>>>> Also, I think we did discuss in the past that unstructured text,
>>>>> image, audio, and other multi-media types is also data on the web
>>>>> that is published in open formats.
>>>>>
>>>>> So really, I don't see the harm in the inclusion on the basis of
>>>>> those objections because I hope that additional data types are not
>>>>> tangential to our standards.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>> Motto: "Do First, Think, Do it Again"
>>>>>
>>>>> <graycol.gif>Annette Greiner ---06/23/2015 12:30:52 PM---Hm, I had
>>>>> never seen that enrichment document and didn't even realize it was in
>>>>> development. It give
>>>>>
>>>>> <ecblank.gif>
>>>>> From:
>>>>> <ecblank.gif>
>>>>> Annette Greiner <amgreiner@lbl.gov>
>>>>> <ecblank.gif>
>>>>> To:
>>>>> <ecblank.gif>
>>>>> Phil Archer <phila@w3.org>
>>>>> <ecblank.gif>
>>>>> Cc:
>>>>> <ecblank.gif>
>>>>> Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio
>>>>> <bfl@cin.ufpe.br>, Caroline Burle <cburle@nic.br>, Newton Calegari
>>>>> <newton@nic.br>, "glpappa@dcc.ufmg.br" <glpappa@dcc.ufmg.br>
>>>>> <ecblank.gif>
>>>>> Date:
>>>>> <ecblank.gif>
>>>>> 06/23/2015 12:30 PM
>>>>> <ecblank.gif>
>>>>> Subject:
>>>>> <ecblank.gif>
>>>>> Re: Enrichment document
>>>>>
>>>>>
>>>>>
>>>>> Hm, I had never seen that enrichment document and didn't even realize
>>>>> it was in development. It gives a nice review of machine learning
>>>>> techniques with a focus on text analysis. Very interesting stuff, but
>>>>> I have a few concerns. My primary concern is that it defines data
>>>>> enrichment much too narrowly. Data enrichment is helpful for all
>>>>> kinds of data, not just "big data" (a term I would encourage us to
>>>>> avoid, as it is overused and highly ambiguous). It is useful in image
>>>>> data as well as text, and in structured as well as unstructured data.
>>>>> I think we need to beware of putting out content that is tangential
>>>>> to the subject of publishing data on the web.
>>>>> -Annette
>>>>>
>>>>> Sent from a keyboard-challenged device
>>>>>
>>>>>> On Jun 23, 2015, at 7:00 AM, Phil Archer <phila@w3.org> wrote:
>>>>>>
>>>>>> I'm putting the DWBP doc through pubrules and, forgive me, I've just
>>>>>> noticed that it links to the enrichment document.
>>>>>>
>>>>>> For those unfamiliar with this, see
>>>>>> http://w3c.github.io/dwbp/enrichment.html
>>>>>>
>>>>>> The WG may well decide to publish this - it certainly deserves
>>>>>> attention and may well be published. However, we can't just include
>>>>>> it as a separate Note without going through the usual process
>>>>>> followed by other documents in the WG.
>>>>>>
>>>>>> For this week's publication I have therefore removed "... according
>>>>>> to the suggestions described in Data Enrichment Technical Note" from
>>>>>> the BP doc and the link to the enrichment doc.
>>>>>>
>>>>>> Let's put this on the agenda for a near future call.
>>>>>>
>>>>>> Phil.
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>>
>>>>>> Phil Archer
>>>>>> W3C Data Activity Lead
>>>>>> http://www.w3.org/2013/data/
>>>>>>
>>>>>> http://philarcher.org
>>>>>> +44 (0)7887 767755
>>>>>> @philarcher1
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>
>
Received on Thursday, 25 June 2015 13:08:07 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 25 June 2015 13:08:08 UTC