- From: Alexander Johannesen <alexander.johannesen@gmail.com>
- Date: Tue, 20 Apr 2010 09:08:17 +1000
- To: Michael Schneider <schneid@fzi.de>
- Cc: "Polleres, Axel" <axel.polleres@deri.org>, semantic-web@w3.org, paoladimaio10@googlemail.com
Michael Schneider <schneid@fzi.de> wrote: > The quality-of-data question is not an easy one, and it's very vague what > "good quality" means for data. What you are about there on pedantic-web.org > [1] seems to be an effort to obtain some sort of "minimum practically > achievable quality" for the data existing on the web. This is very important > IMO, but other people won't probably be satisfied by this, because (amongst > other things) this minimum standard won't match their tools' requirements. Let me give you an example of just how bad these things can be. A few years ago I worked for a national library as a technology manager of sorts, and one of the things I brought with me to that position was my knowledge and love of Topic Maps. So, the natural idea was to take library information, otherwise known as MARC (or the culture of MARC) [MAchine Readable Cataloging], and convert the metadata within into glorious semantic knowledge maps. The library world has been tinkering with metadata like, forever, and with MARC from the 80's. They've been polishing and refining and tinkering with their MARC data for over 30 years, and not only that, but catalogers are pedantic, thorough and neat. If any collection of metadata would be in a useful state, it would be this one. But sadly this is not the case. There's no schema for data, no typing and rules and tricks are humanly upheld, which means it's a huge hotch-potch of good records and bad records all mixed up in one, with no identity management, and with large and costly match / merge processes that continuously tries to wash and clean these records, and even then they turned out to be too hard to do anything automatic with. It was a disaster on many levels, not least to me who in the end chose to quit. When your pedantic librarians can't get this right, let me just say that this problem is slightly bigger than what you might think or even imagine at times. There's a reason the brilliant minds at Google aren't doing RDF or strongly typed data. Yet. Regards, Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ ---------------------------------------------- ------------------ http://www.google.com/profiles/alexander.johannesen ---
Received on Monday, 19 April 2010 23:08:49 UTC