- From: glenn mcdonald <glenn@furia.com>
- Date: Tue, 12 Apr 2011 09:33:05 -0400
- To: Kingsley Idehen <kidehen@openlinksw.com>
- Cc: Deborah MacPherson <debmacp@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <BANLkTi=7zinGTUv7O3WTxQ3kxH5C1N4Svw@mail.gmail.com>
> > As part of conversations about data, you do need to able to see the > "subjectively" bad to make it "subjectively" good. What you can't do (which > is what Glenn does repeatedly) is conflate the tools that actually enable > you see the subjectively "good, bad, or ugly" with said data. > I'm a tool developer with "first hand" experience, as you put it, too. I'm not conflating the tools and the data. But the complete data experience is the product of the tools and the data. Is Excel rendered useless because a list of countries with obvious errors > was presented in the spreadsheet? To an audience of Spreadsheet developers > (programmers making a Spreadsheet product) that's irrelevant That attitude is how Excel ended up with essentially no real data-cleaning tools, which is pathetic. The job of data tools is to mediate between people and computers, and thus helping people identify and understand and fix and improve data is just as much the tools' (and tool developers') responsibility as showing you a list of entity URIs. The list of data-quality metrics is also effectively a data-tool task list. this is why my demos are oriented towards enabling the beholder disambiguate > his/her/its quest via filtering applied to entity types and other > properties. Which is what I was talking about in Boundedness: does the data have the properties you need to extract the subset you want. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. That was a data problem, not a tool problem. But the difficulty of figuring this out, *using * the tools, was a tool problem. But of the 17 other qualities on my list + Dave's additions, at least 15 of them directly bear on the feasibility of using filtering to extract a good subset out of a flawed corpus.
Received on Tuesday, 12 April 2011 13:33:53 UTC