- From: Caroline Burle <cburle@nic.br>
- Date: Thu, 25 Jun 2015 10:07:26 -0300
- To: public-dwbp-wg@w3.org
Dear all, +1 to InWeb. We agree that data enrichment is a important issue to be discussed and the document written should give support to the Data Enrichment BP. On the other hand, we still have to decide as a group what is the best way (and where) to include this content. Kind regards, Bernadette, Caroline and Newton On 24/06/15 18:58, Wagner Meira Jr. wrote: > Hi all, > > We, from InWeb, agree with the various observations and believe that > refining it is the way to achieve something relevant wrt data enrichment > and DWBP. > > First of all, it was never our intention to disrupt the usual process for > generating and publishing W3C documents. We presented a very first > version of the document in the last F2F meeting and since then > have been extending it. We were not sure about the post presentation > process and are sorry for the confusion. > > Second, we agree that it may not be a good idea to use "big data", > even as a background motivation for the document, as it is. The > practices reported are applicable to any kind and volume of data. > > Third, extending the discussion beyond textual data, although > relevant, may represent a much broader scope. We suggest > to first focus on textual data and later evaluate whether the practices > generalize to other types of data. > > Fourth, our rationale in building the document was not really to > provide an exhaustive overview of the methods and techniques for > data enrichment, but what criteria any such method should satisfy. > We understand that this strategy is more compatible with the current > DWPB draft. > > Finally, we have discussed quite a lot about whether data enrichment > makes sense in the DWBP draft, and so far, the conclusion has been > that it fits there, but such outcome may change as we deepen the > discussion in the group. We suggest that you take the current draft just > as a suggestion on how to approach data enrichment in the context of > DBWP. > > Thus, how do you guys want to evolve on this issue? > > Best, > > Wagner > > On 24-06-2015 11:08, yaso@nic.br wrote: >> Hi everyone, >> >> I agree with both Annete and Antoine, but still think it is an important >> issue to be discussed by the group. I understand your fears on turning >> back to our scope endless discussion, but there are specific points, >> like the one that Antoine raised on enriching linked data, for example, >> that are very useful for us considering the scenario of the web >> nowadays. ] >> >> Keep in mind that our scope is already well defined: >> >> "This document is concerned solely with best practices that: >> >> 1. are specifically relevant to data published on the Web; >> 2. encourage publication or re-use of data on the Web; >> 3. can be tested by machines, humans or a combination of the two. >> >> As noted above, whether a best practice has or has not been followed >> should be judged against the intended outcome, not the possible approach >> to implementation which is offered as guidance. >> >> A best practice is always subject to improvement as we learn and evolve >> the Web together." >> >> Maybe this note on data enrichment note [1] can turn in to a use case. I >> think that reviewing it with an eye for the challenges that we might >> raise from the InWeb work can be a good idea, since we already went thru >> this process for the other Best Practices. >> >> >> >> yaso >> >> >> >> [1] https://w3c.github.io/dwbp/enrichment.html >> >> >> >> >> >> >> >> On 06/23/2015 05:17 PM, Antoine Isaac wrote: >>> Hi, >>> >>> I fully support Annette's point about the fear of including >>> recommendations about everything, even when not really specific to >>> the web. >>> >>> As far as the content of the document is concerned, I must confess I've >>> never looked at it. And even though I've missed a couple of calls >>> lately, I don't remember any formal request for review has been ever >>> made... >>> >>> It's a pity, because the document may contain some very good stuff. But >>> it may also be very shaky of others. For instance, I have the >>> feeling it >>> ignores many things done for enriching linked data. And work on >>> evaluating the results - actually it's confusing to find that a fairly >>> long document on data enrichment would only have three occurrence of >>> 'quality' in it. Probably it will be good to discuss this also in the >>> coming calls. >>> >>> Finally, there are quite big typos, even in the header. For example, >>> "Desirible". >>> >>> Best, >>> >>> Antoine >>> >>> On 6/23/15 8:52 PM, Annette Greiner wrote: >>>> Hi Steve, >>>> I think you're right that "big data" gets used to mean just plain >>>> data. If the distinction between the meaning of "big data" and "data" >>>> is becoming an academic one, isn't that even more reason to avoid >>>> trying to make the distinction in our own work? Let's just call data >>>> data. If we want to talk about the Vs, we can use the V words. I >>>> actually work in academia, in the data science program at Berkeley, >>>> and the consensus even there about the term is that it is not very >>>> helpful. It is perceived as shallow and attention-seeking. >>>> >>>> Re the scoping issue, you misunderstand me, and looking back at the >>>> placement of my last sentence, I can see why. (Sorry.) I don't think >>>> that addressing the full meaning of data enrichment would throw this >>>> piece out of scope. If anything, it would bring it back in. I think >>>> its current failure to address the broader meaning of enrichment is a >>>> serious problem, separate from the scoping issue. In addition, I >>>> think that we should always ask ourselves whether what we are writing >>>> is relevant in particular to data on the web. I don't think there is >>>> anything particularly web-based about machine learning. >>>> >>>> I worry that we are slowly trying to write something about every >>>> aspect of the data lifecycle. It's difficult enough for me to accept >>>> the extra BPs about how to create a vocabulary, and I worry about the >>>> data preservation BPs on similar grounds. Machine learning strikes me >>>> as further afield than either of those. Should we also write notes >>>> about hadoop, database administration, data visualization, and survey >>>> design? If we define our scope this broadly, what would we rule out? >>>> -Annette >>>> >>>> On Jun 23, 2015, at 10:32 AM, Steven Adler <adler1@us.ibm.com> wrote: >>>> >>>>> Annette, >>>>> >>>>> At first I agreed but then I have to say that I don't... because >>>>> "Big Data" is over-used and somewhat amorphous it is becoming a term >>>>> used by everyone for much of what we might also narrowly define as >>>>> "just Data." ie, the distinction is increasingly academic. >>>>> >>>>> Also, I think we did discuss in the past that unstructured text, >>>>> image, audio, and other multi-media types is also data on the web >>>>> that is published in open formats. >>>>> >>>>> So really, I don't see the harm in the inclusion on the basis of >>>>> those objections because I hope that additional data types are not >>>>> tangential to our standards. >>>>> >>>>> Best Regards, >>>>> >>>>> Steve >>>>> >>>>> Motto: "Do First, Think, Do it Again" >>>>> >>>>> <graycol.gif>Annette Greiner ---06/23/2015 12:30:52 PM---Hm, I had >>>>> never seen that enrichment document and didn't even realize it was in >>>>> development. It give >>>>> >>>>> <ecblank.gif> >>>>> From: >>>>> <ecblank.gif> >>>>> Annette Greiner <amgreiner@lbl.gov> >>>>> <ecblank.gif> >>>>> To: >>>>> <ecblank.gif> >>>>> Phil Archer <phila@w3.org> >>>>> <ecblank.gif> >>>>> Cc: >>>>> <ecblank.gif> >>>>> Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio >>>>> <bfl@cin.ufpe.br>, Caroline Burle <cburle@nic.br>, Newton Calegari >>>>> <newton@nic.br>, "glpappa@dcc.ufmg.br" <glpappa@dcc.ufmg.br> >>>>> <ecblank.gif> >>>>> Date: >>>>> <ecblank.gif> >>>>> 06/23/2015 12:30 PM >>>>> <ecblank.gif> >>>>> Subject: >>>>> <ecblank.gif> >>>>> Re: Enrichment document >>>>> >>>>> >>>>> >>>>> Hm, I had never seen that enrichment document and didn't even realize >>>>> it was in development. It gives a nice review of machine learning >>>>> techniques with a focus on text analysis. Very interesting stuff, but >>>>> I have a few concerns. My primary concern is that it defines data >>>>> enrichment much too narrowly. Data enrichment is helpful for all >>>>> kinds of data, not just "big data" (a term I would encourage us to >>>>> avoid, as it is overused and highly ambiguous). It is useful in image >>>>> data as well as text, and in structured as well as unstructured data. >>>>> I think we need to beware of putting out content that is tangential >>>>> to the subject of publishing data on the web. >>>>> -Annette >>>>> >>>>> Sent from a keyboard-challenged device >>>>> >>>>>> On Jun 23, 2015, at 7:00 AM, Phil Archer <phila@w3.org> wrote: >>>>>> >>>>>> I'm putting the DWBP doc through pubrules and, forgive me, I've just >>>>>> noticed that it links to the enrichment document. >>>>>> >>>>>> For those unfamiliar with this, see >>>>>> http://w3c.github.io/dwbp/enrichment.html >>>>>> >>>>>> The WG may well decide to publish this - it certainly deserves >>>>>> attention and may well be published. However, we can't just include >>>>>> it as a separate Note without going through the usual process >>>>>> followed by other documents in the WG. >>>>>> >>>>>> For this week's publication I have therefore removed "... according >>>>>> to the suggestions described in Data Enrichment Technical Note" from >>>>>> the BP doc and the link to the enrichment doc. >>>>>> >>>>>> Let's put this on the agenda for a near future call. >>>>>> >>>>>> Phil. >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> Phil Archer >>>>>> W3C Data Activity Lead >>>>>> http://www.w3.org/2013/data/ >>>>>> >>>>>> http://philarcher.org >>>>>> +44 (0)7887 767755 >>>>>> @philarcher1 >>>>>> >>>>> >>>>> >>>> >>>> > >
Received on Thursday, 25 June 2015 13:08:07 UTC