Re: Enrichment document

Hi,

I fully support Annette's point about the fear of including recommendations about everything, even when not really specific to the web.

As far as the content of the document is concerned, I must confess I've never looked at it. And even though I've missed a couple of calls lately, I don't remember any formal request for review has been ever made...

It's a pity, because the document may contain some very good stuff. But it may also be very shaky of others. For instance, I have the feeling it ignores many things done for enriching linked data. And work on evaluating the results - actually it's confusing to find that a fairly long document on data enrichment would only have three occurrence of 'quality' in it. Probably it will be good to discuss this also in the coming calls.

Finally, there are quite big typos, even in the header. For example, "Desirible".

Best,

Antoine

On 6/23/15 8:52 PM, Annette Greiner wrote:
> Hi Steve,
> I think you're right that "big data" gets used to mean just plain data. If the distinction between the meaning of "big data" and "data" is becoming an academic one, isn't that even more reason to avoid trying to make the distinction in our own work? Let's just call data data. If we want to talk about the Vs, we can use the V words.  I actually work in academia, in the data science program at Berkeley, and the consensus even there about the term is that it is not very helpful. It is perceived as shallow and attention-seeking.
>
> Re the scoping issue, you misunderstand me, and looking back at the placement of my last sentence, I can see why. (Sorry.) I don't think that addressing the full meaning of data enrichment would throw this piece out of scope. If anything, it would bring it back in. I think its current failure to address the broader meaning of enrichment is a serious problem, separate from the scoping issue. In addition, I  think that we should always ask ourselves whether what we are writing is relevant in particular to data on the web. I don't think there is anything particularly web-based about machine learning.
>
> I worry that we are slowly trying to write something about every aspect of the data lifecycle. It's difficult enough for me to accept the extra BPs about how to create a vocabulary, and I worry about the data preservation BPs on similar grounds. Machine learning strikes me as further afield than either of those. Should we also write notes about hadoop, database administration, data visualization, and survey design? If we define our scope this broadly, what would we rule out?
> -Annette
>
> On Jun 23, 2015, at 10:32 AM, Steven Adler <adler1@us.ibm.com> wrote:
>
>> Annette,
>>
>> At first I agreed but then I have to say that I don't...  because "Big Data" is over-used and somewhat amorphous it is becoming a term used by everyone for much of what we might also narrowly define as "just Data."  ie, the distinction is increasingly academic.
>>
>> Also, I think we did discuss in the past that unstructured text, image, audio, and other multi-media types is also data on the web that is published in open formats.
>>
>> So really, I don't see the harm in the inclusion on the basis of those objections because I hope that additional data types are not tangential to our standards.
>>
>> Best Regards,
>>
>> Steve
>>
>> Motto: "Do First, Think, Do it Again"
>>
>> <graycol.gif>Annette Greiner ---06/23/2015 12:30:52 PM---Hm, I had never seen that enrichment document and didn't even realize it was in development. It give
>>
>> <ecblank.gif>
>> From:
>> <ecblank.gif>
>> Annette Greiner <amgreiner@lbl.gov>
>> <ecblank.gif>
>> To:
>> <ecblank.gif>
>> Phil Archer <phila@w3.org>
>> <ecblank.gif>
>> Cc:
>> <ecblank.gif>
>> Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>, "glpappa@dcc.ufmg.br" <glpappa@dcc.ufmg.br>
>> <ecblank.gif>
>> Date:
>> <ecblank.gif>
>> 06/23/2015 12:30 PM
>> <ecblank.gif>
>> Subject:
>> <ecblank.gif>
>> Re: Enrichment document
>>
>>
>>
>> Hm, I had never seen that enrichment document and didn't even realize it was in development. It gives a nice review of machine learning techniques with a focus on text analysis. Very interesting stuff, but I have a few concerns. My primary concern is that it defines data enrichment much too narrowly. Data enrichment is helpful for all kinds of data, not just "big data" (a term I would encourage us to avoid, as it is overused and highly ambiguous). It is useful in image data as well as text, and in structured as well as unstructured data. I think we need to beware of putting out content that is tangential to the subject of publishing data on the web.
>> -Annette
>>
>> Sent from a keyboard-challenged device
>>
>>> On Jun 23, 2015, at 7:00 AM, Phil Archer <phila@w3.org> wrote:
>>>
>>> I'm putting the DWBP doc through pubrules and, forgive me, I've just noticed that it links to the enrichment document.
>>>
>>> For those unfamiliar with this, see
>>> http://w3c.github.io/dwbp/enrichment.html
>>>
>>> The WG may well decide to publish this - it certainly deserves attention and may well be published. However, we can't just include it as a separate Note without going through the usual process followed by other documents in the WG.
>>>
>>> For this week's publication I have therefore removed "... according to the suggestions described in Data Enrichment Technical Note" from the BP doc and the link to the enrichment doc.
>>>
>>> Let's put this on the agenda for a near future call.
>>>
>>> Phil.
>>>
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>>
>>
>
>
>

Received on Tuesday, 23 June 2015 20:30:49 UTC