- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Tue, 12 Apr 2011 09:48:10 -0400
- To: glenn mcdonald <glenn@furia.com>
- CC: Deborah MacPherson <debmacp@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <4DA4581A.9010908@openlinksw.com>
On 4/12/11 9:33 AM, glenn mcdonald wrote: > > As part of conversations about data, you do need to able to see > the "subjectively" bad to make it "subjectively" good. What you > can't do (which is what Glenn does repeatedly) is conflate the > tools that actually enable you see the subjectively "good, bad, or > ugly" with said data. > > > I'm a tool developer with "first hand" experience, as you put it, too. > I'm not conflating the tools and the data. But the complete data > experience is the product of the tools and the data. But who ever told you, or inferred to you, that any LOD demo is about the "Complete Linked Data Experience" let alone the "Complete Data Experience". Who even knows, emphatically, what the so called "Complete Data Experience" actually is? That's as subjective a statement as I've every heard. Its the very line that continues to separate us. I might have my own perception of the aforementioned experience, but I have no business enforcing that on anyone else, its just my world view, end of story. Thus, I hold my position re. your subjective conflation of matters. When people publish demos of their products, they aren't publishing the demos for "your world view" they are publishing it from theirs, first. Of course, bearing in mind our similarities and disparities as cognitive beings there is varied potential for intersection of world views i.e., fusion. Naturally, fusion can occur with varying degrees of friction. > > Is Excel rendered useless because a list of countries with > obvious errors was presented in the spreadsheet? To an audience of > Spreadsheet developers (programmers making a Spreadsheet product) > that's irrelevant > > > That attitude is how Excel ended up with essentially no real > data-cleaning tools, which is pathetic. And your comments once again reflect the issues I have with your commentary. Excel the pathetic dominates the world of spreadsheets. Nuff said. Did write an alternative? Why isn't the world using your alternative if such a thing exists. Bearing in mind the huge market share of Excel why are you overlooking the massive opportunity to cleanup via your superior product? > The job of data tools is to mediate between people and computers, and > thus helping people identify and understand and fix and improve data > is just as much the tools' (and tool developers') responsibility as > showing you a list of entity URIs. What is a Data Tool? Again, 100% subjective. Some people might think of Excel as a Data Tool others see it as something completely different. > The list of data-quality metrics is also effectively a data-tool task > list. > > this is why my demos are oriented towards enabling the beholder > disambiguate his/her/its quest via filtering applied to entity > types and other properties. > > > Which is what I was talking about in Boundedness: does the data have > the properties you need to extract the subset you want. E.g., Danny > Ayers yesterday was trying to make a SPARQL query for Wordnet that > found the planets in the solar system that aren't named after Roman > gods. But neither he nor I could find any way in the data to > distinguish actual planets in the list of solar bodies, so we couldn't > quite make it right. And did you post a callout here or on Twitter or anyone else for other folks to chime in? > That was a data problem, not a tool problem. But the difficulty of > figuring this out, /using/ the tools, was a tool problem. But the tools (or your activity) unveiled a critical problem aligned to your specific goals. That's subjectively bad data laying foundation for subjectively improved data. All you need to do is open up a conversation that eventually results in a linkset that fixes the problem and delivers the "context lenses" you seek. This is a common and expected issue re. Linked Data at any scale, beyond your personal computer or personally curated data space. > > But of the 17 other qualities on my list + Dave's additions, at least > 15 of them directly bear on the feasibility of using filtering to > extract a good subset out of a flawed corpus. In my world: knowledge starts by discovering what you don't know. Same rule applies to data quality, you have to find the broken data before you can fix it. Do take issue with the mechanism that helps you find the broken data. Of course, take issue if there isn't a feedback loop or the loop is clogged with intransigence etc.. Neither is the case in the Linked Data realms of interest to me. -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Tuesday, 12 April 2011 13:48:34 UTC