Re: DBpedia Data Quality Evaluation Campaign

Hi Soren

i understand. Anyway also wrt to wrong extractions it might be of use
to consider supporting the users e.g. proposing only suspicious cases
and not any resource.

Freebase, as a very last resort, has also been (is?) using
crowdsourcing (e.g. amazon mechanical turk) to solve certain conflicts
that only humans can spot. But this usually enter(ed?) the play after
other tricks had prepared the field.

, e.g. statistical analysis that highlight suspicious cases first e.g.
dates should statistically fall in a certain range, names also
statistically look like names, address like addresses.. etc. if they
dont, send them to the turks.

Proposing to the user just the cases that seem suspicious (and
highlighting which if the many fields) might turn out to help plenty.
cheers
Gio

On Thu, Nov 15, 2012 at 6:19 PM, Sören Auer
<auer@informatik.uni-leipzig.de> wrote:
> Am 15.11.2012 19:12, schrieb Giovanni Tummarello:
>> Am i really supposed to know if any of the fact below is wrong?
>> really?
>
> Its not about factual correctness, but about correct extraction and
> representation. If Wikipedia contains false information DBpedia will
> too, so we can not change this (at that point). What we want to improve,
> however, is the quality of the extraction.
>
> Best,
>
> Sören
>
>> dbp-owl:PopulatedPlace/area
>> "10.63" (@type = http://dbpedia.org/datatype/squareKilometre)
>> dbp-owl:abstract
>> "La Chapelle-Saint-Laud is a commune in the Maine-et-Loire department
>> of western France." (@lang = en)
>> dbp-owl:area
>> "1.063e+07" (@type = http://www.w3.org/2001/XMLSchema#double)
>> dbp-owl:canton
>> dbpedia:Canton_of_Seiches-sur-le-Loir
>> dbp-owl:country
>> dbpedia:France
>> dbp-owl:department
>> dbpedia:Maine-et-Loire
>> dbp-owl:elevation
>> "85.0" (@type = http://www.w3.org/2001/XMLSchema#double)
>> dbp-owl:intercommunality
>> dbpedia:Pays_Loire-Angers
>> dbp-owl:intercommunality
>> dbpedia:Communauté_de_communes_du_Loir
>> dbp-owl:maximumElevation
>> "98.0" (@type = http://www.w3.org/2001/XMLSchema#double)
>> dbp-owl:minimumElevation
>> "28.0" (@type = http://www.w3.org/2001/XMLSchema#double)
>> dbp-owl:populationTotal
>> "583" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-owl:postalCode
>> "49140" (@lang = en)
>> dbp-owl:region
>> dbpedia:Pays_de_la_Loire
>> dbp-prop:areaKm
>> "11" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-prop:arrondissement
>> "Angers" (@lang = en)
>> dbp-prop:canton
>> dbpedia:Canton_of_Seiches-sur-le-Loir
>> dbp-prop:demonym
>> "Capellaudain, Capellaudaine" (@lang = en)
>> dbp-prop:department
>> dbpedia:Maine-et-Loire
>> dbp-prop:elevationM
>> "85" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-prop:elevationMaxM
>> "98" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-prop:elevationMinM
>> "28" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-prop:insee
>> "49076" (@type = http://www.w3.org/2001/XMLSchema#integer)
>> dbp-prop:intercommunality
>> dbpedia:Pays_Loire-Angers
>> dbp-prop:intercommunality
>> dbpedia:Communauté_de_communes_du_Loir
>>
>> On Thu, Nov 15, 2012 at 4:58 PM,  <zaveri@informatik.uni-leipzig.de> wrote:
>>> Dear all,
>>>
>>> As we all know, DBpedia is an important dataset in Linked Data as it is not
>>> only connected to and from numerous other datasets, but it also is relied
>>> upon for useful information. However, quality problems are inherent in
>>> DBpedia be it in terms of incorrectly extracted values or datatype problems
>>> since it contains information extracted from crowd-sourced content.
>>>
>>> However, not all the data quality problems are automatically detectable.
>>> Thus, we aim at crowd-sourcing the quality assessment of the dataset. In
>>> order to perform this assessment, we have developed a tool whereby a user
>>> can evaluate a random resource by analyzing each triple individually and
>>> store the results. Therefore, we would like to request you to help us by
>>> using the tool and evaluating a minimum of 3 resources. Here is the link to
>>> the tool: http://nl.dbpedia.org:8080/TripleCheckMate/, which also includes
>>> details on how to use it.
>>>
>>> In order to thank you for your contributions, a lucky winner will win either
>>> a Samsung Galaxy Tab 2 or an Amazon voucher worth 300 Euro. So, go ahead,
>>> start evaluating now !! Deadline for submitting your evaluations is 9th
>>> December, 2012.
>>>
>>> If you have any questions or comments, please do not hesitate to contact us
>>> at dbpedia-data-quality@googlegroups.com.
>>>
>>> Thank you very much for your time.
>>>
>>> Regards,
>>> DBpedia Data Quality Evaluation Team.
>>> https://groups.google.com/d/forum/dbpedia-data-quality
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging Program.
>>>
>>>
>>>
>>>
>>
>>
>

Received on Thursday, 15 November 2012 18:45:03 UTC