- From: Jose Manuel Alonso <josema.alonso@fundacionctic.org>
- Date: Mon, 21 Dec 2009 19:33:26 +0100
- To: chris-beer@grapevine.net.au
- Cc: "Antti Poikola" <antti.poikola@gmail.com>, "Joe Carmel" <joe.carmel@comcast.net>, "'Jonathan Gray'" <jonathan.gray@okfn.org>, "'Steven Clift'" <clift@e-democracy.org>, public-egov-ig@w3.org, sunlightlabs@groups.google.com, "'Acar, Suzanne'" <suzanne.acar@ic.fbi.gov>
>> * If there is, let's say some thousand, datasets in data.gov, is >> there >> any analysis or wild guesses of how many is missing 10 000, 50 000, >> 100 >> 000, 500 000? > > I'd say most, but that there is probably not as many as you'd think. > And > I'm including data.gov.* in that. (We have to as a group remain > international in focus :) ) Many seem to include "views" of data in > the > term "missing datasets" - I think that if one could identify what > datasets > are primary (something I'll expand on once I find what you meant > above), > then we could generate a lot of other datasets from these. I guess > what > I'm saying is its probably just as important to ask how many > datasets out > there > are dependent on other datasets for their data. Ok, so I told you this one deserved it's own separate message. This is something we've been discussing at CTIC for quite a while: what is a dataset, how would you define it? How would you count how many you've published? Is the "2005 Toxics Release Inventory data for the state of Alaska" one dataset? Is "Toxics Release Inventory data for the state of Alaska" one dataset? Is "Toxics Release Inventory data for all the states" one dataset? If all of the above are datasets (even if not), how many is data.gov publishing? In one of the projects I'm currently involved in, the government is about to publish information about all the public buildings. Is this one dataset? What if the government publishes just the information of the public schools? One dataset? Then, the one about hospitals... one dataset? But this two types of buildings (and several other types) are part of the big dataset, so is this really a dataset or a subset of the big one? How may should I count? One? Three? Unfortunately, I believe I don't have a good answer. I tried for a long while, telling myself a dataset should be anything that is meaningful as a separate entity and that datasets can be combined into super-datasets. Example: public schools is one dataset, hospitals is another one, public buildings is another one, but are those three datasets? hmm... maybe we should only count the smaller ones? What if instead of hospitals, we talk about "healthcare related centers" such as: hospitals, ER, GPs, Dentists, Pharmacies, Opticians (taken from NHS.UK). Hey, we have now six datasets? Or just a big one and the six smaller ones are just "a class of" the big one... Btw, does the number really matter? Or should we just better catalog in terms of knowledge areas? Unless we (at large) can agree on what is a dataset and how they should be counted, I believe talking about numbers has no much sense. Let the discussion (go on) begin... :) -- Jose
Received on Monday, 21 December 2009 18:34:07 UTC