- From: Polleres, Axel <axel.polleres@deri.org>
- Date: Mon, 19 Apr 2010 11:09:50 +0100
- To: <paoladimaio10@googlemail.com>, <adam.saltiel@gmail.com>, <uk-government-data-developers@googlegroups.com>
- Cc: <semantic-web@w3.org>
- Message-ID: <316ADBDBFE4F4D4AA4FEEF7496ECAEF9035DF69D@EVS1.ac.nuigalway.ie>
Paola, You may want to check: http://www.pedantic-web.org/ on our efforts to improve data quality. We also have a paper on findings so far at LDOW [1]. Cheers, Axel 1. Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. Weaving the pedantic web. In 3rd International Workshop on Linked Data on the Web (LDOW2010) at WWW2010, Raleigh, USA, April 2010. ________________________________ From: semantic-web-request@w3.org To: adasal ; uk-government-data-developers@googlegroups.com Cc: Semantic Web Sent: Mon Apr 19 10:51:14 2010 Subject: data quality Something else I wanted to add but forgot as it was late post: One of the issues that is coming up related to the discussion below, is the quaity of data (which came up in the gov data list a while back, hence in cc) A question then is: why (in some cases) is the data 'not fit for purpose?' Again several possible hypotheses in each case may need to be tested is the data inconsistent because the real world is inconsistent (the world seems to hang together even when it does not make sense to us while data models dont) - in which case maybe there is not much tha we can do, other than to continue to attempt creating plausible models of the world is the data any use before it is opened and rdfized? or does something happen in the rdfization process? lets not forget that to obtain meaningful outputs from dbases, a lot of work needs to go in it, I am thinking normalisation of schemas but also, data cleaning, which constitutes a majority of efforts in data mining I dont think the fact that data is expressed in RDF would automatically make it good Again, a good diggin of a significant set of examples of when 'data is not fit for purpose' could yield some clues as to what kind of work needs to be done So I would be inclined when something doesnt work, not just trhow it away, but study it systematically After all, most of what we know in medicine has com from dissecting corpses PDM On Sun, Apr 18, 2010 at 11:16 PM, Paola Di Maio <paola.dimaio@gmail.com> wrote: In this thread, and the parallel ones, I see different problem spaces its a complex issue that should be broken down one is the query composition another is the availability of data and then the ease of use/utility of the tools (probably more) then there are some conflicts, for example on the one hand the w3c produces standards (rdf, owl) on the other hand the tools and platforms that implement them, without necessarily making the user tasks intuitive enough, having had random conversations with platform developers it looks they want to monetize on their work, and are not in a hurry to achieve any results until their finances are secured perhaps the w3c could act more as 'the customer' , and promote the adoption of usability standard alongside technical ones (an argument that I occasionally try to make) from the query composition front, what about a tool that would facilitate the generation of rdf data when its not available? assuming Danny eventually works out the optimal query, for example to include specific data in relation to his side of the valley, where humidity, wind, sun exposure, soil composition and other local properties make up a microclimate shouldnt there be a(any) place where this data could be entered so that the query can be performed? (I still think for some answers the web may still not be the best place, but lets think hypotetically) the nearest thing I have seen to the SW has been when I saw a demo of semantic wiki ( Denny Vrandivich ) one could enter new data on the fly, and the table would update, I thought that's cool, I could probably work with this (I may to have to go through the examples a few times, but it looked doable to me when I so the demo) surely query manipulation can be made foolproof, after all forms were invented for that purpose if I remember, (an interface that would allow addin more fields to the query) last i heard of semantic media wiki 'there were issues' I would have thought thats a good place to start, anyone knows what happens and if there are any test implementations or tutorials? what can be so wrong with it> Once tasks are defined, the data is reliable and good enough, datasets can be added on the fly as needed, and the tools are straighforward and the tasks (say querying and manipulating queries) are made more intuitive, then I am sure its all about setting up good enough pilot studies from different fields of application with, for each, enough people and community involvmene since everybody is already more or less working on different aspect of the above, I am sure that some magic can be done simply with a bit more coordination of the different efforts the cost/benefit issue is also complex, depending what is calculated as cost and what as benefit, as there are different classes of both, IMHO to society at large, and to the public pocket the last ten years of publicly funded research have been a relatively quantifiable cost (can work out some ballpark figure by looking at the sw research expenditure, but I am afraid to do it) among the benefits have been lots of phds, salaries, some careers some new knowledge and innovation, but some (including myself) argue that visible 'public' benefits are not (yet) adequate to the public costs, which remain imho not fully justified In my analysis this has turned out to be a problem with our research industry,(very generalised statement) where research expenditure is often in a grey policy area, not clearly enough demarcated what public benefits should derive from research, and another can of worms altogether to an average organisation that is confronted with the option to invest in sw technologies today, it may just be too early , unquantifiable costs and risks, but also limited business/revenue models etc (how is giving my website users some sw functionality is going to provide my customers with more value?) I think I have heard of some benefits being reaped in the non public domain, but because of that, we dont know for sure what happens behind firewalls Assuming some cohesion of purpose can be arranged, and that research can provide a wide enough range of more real world well defined pilot schemes (where the cost/benefit analysis each pilot project is clear upfront, and utility metrics adopted, for example) with a sufficiently healthy stakeholder base not too easily alienated, I am sure it would be possible to make at least some sense of the word done so far anything anywhere near the direction above can probably only be achieved by a community, which it looks is trying to pull itself together here? :-) PDM On Sun, Apr 18, 2010 at 10:22 PM, adasal <adam.saltiel@gmail.com> wrote: Agriculture oriented data spaces (ontology and instance data) How could that ever be automatic? Agriculture oriented data spaces (ontology and instance data) Cannot anticipate every possible query, or even broad area of interest, in DBpedia. There must be an impulse to make a query of some sort. The issue is how complex that query must be. Isn't the implicit question why cannot some small query be enough to draw out the information I want? Here the query terms should be enough to form a coherent query. In this example they should translate into a sparql query. But that is not enough, because DBPedia needs a schema and some instance data too. erm. Or perhaps it could be semi-automatic? Imagine that there is a repository with sample kinds of data in it. I think this would be easy to use. I want to build up a query about tomato seeds, planting, region, time of year. So some general data is classified along those lines. That would be combined into a schema. Maybe some of it would be a subset of other schemas, so in my making the choice further useful suggestions could be made. I would then be asked to refine the parameters of the query by actual region, etc. I am assuming that interested parties would make available basic meta data sets with human understandable sample data. Am I making any sort of sensible suggestion here? Is this different to what already exists as available triples? I am unsure. There is something circular here. Even so we are still left with that data that has not been classified because there is no interested party to do so, or because the type of classification is new, complex or transient. Adam On 18 April 2010 21:56, Danny Ayers <danny.ayers@gmail.com> wrote: Thanks Kingsley still not automatic though, is it? On 18 April 2010 22:38, Kingsley Idehen <kidehen@openlinksw.com> wrote: > Danny Ayers wrote: >> >> Kingsley, how do I find out when to plant tomatos here? >> > > And you find the answer to that in Wikipedia via > <http://en.wikipedia.org/wiki/Tomato>? Of course not. > > Re. DBpedia, if you have a Agriculture oriented data spaces (ontology and > instance data) that references DBpedia (via linkbase) then you will have a > better chance of an answer since we would have temporal properties and > associated values in the Linked Data Space (one that we can mesh with > DBpedia even via SPARQL). > > Kingsley >> >> On 17 April 2010 19:36, Kingsley Idehen <kidehen@openlinksw.com> wrote: >> >>> >>> Danny Ayers wrote: >>> >>>> >>>> On 16 April 2010 19:29, greg masley <roxymuzick@yahoo.com> wrote: >>>> >>>> >>>>> >>>>> What I want to know is does anybody have a method yet to successfully >>>>> extract data from Wikipedia using dbpedia? If so please email the >>>>> procedure >>>>> to greg@masleyassociates.com >>>>> >>>>> >>>> >>>> That is an easy one, the URIs are similar - you can get the pointer >>>> from db and get into wikipedia. Then you do your stuff. >>>> >>>> I'll let Kingsley explain. >>>> >>>> >>>> >>> >>> Greg, >>> >>> Please add some clarity to your quest. >>> >>> DBpedia the project is comprised of: >>> >>> 1. Extractors for converting Wikipedia content into Structured Data >>> represented in a variety of RDF based data representation formats >>> 2. Live instance with the extracts from #1 loaded into a DBMS that >>> exposes a >>> SPARQL endpoint (which lets you query over the wire using SPARQL query >>> language). >>> >>> There is a little more, but I need additional clarification from you. >>> >>> >>> -- >>> >>> Regards, >>> >>> Kingsley Idehen President & CEO OpenLink Software Web: >>> http://www.openlinksw.com >>> Weblog: http://www.openlinksw.com/blog/~kidehen <http://www.openlinksw.com/blog/%7Ekidehen> >>> Twitter/Identi.ca: kidehen >>> >>> >>> >>> >>> >>> >> >> >> >> > > > -- > > Regards, > > Kingsley Idehen President & CEO OpenLink Software Web: > http://www.openlinksw.com > Weblog: http://www.openlinksw.com/blog/~kidehen <http://www.openlinksw.com/blog/%7Ekidehen> > Twitter/Identi.ca: kidehen > > > > > -- http://danny.ayers.name -- Paola Di Maio ************************************************** “Logic will get you from A to B. Imagination will take you everywhere.” Albert Einstein ************************************************** -- Paola Di Maio ************************************************** “Logic will get you from A to B. Imagination will take you everywhere.” Albert Einstein **************************************************
Received on Monday, 19 April 2010 10:10:42 UTC