W3C home > Mailing lists > Public > semantic-web@w3.org > January 2015

RE: [Dbpedia-discussion] Advancing the DBpedia ontology

From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
Date: Sun, 25 Jan 2015 20:41:23 +0200
To: <dbpedia-discussion@lists.sourceforge.net>, "'Linked Data community'" <public-lod@w3.org>, <semantic-web@w3.org>
Message-ID: <006f01d038ce$88569b20$9903d160$@alexiev@ontotext.com>
> http://mappings.dbpedia.org/index.php/DBpedia_Ontology_Committee

I've enlarged the goals as follows:
- set the future directions of the DBpedia ontology
- set best practices for mapping
- engage the community in meaningful discussions, eg see https://www.wikidata.org/wiki/Wikidata:Property_proposal/Authority_control 
- formulate and execute focused investigations that lead to best practices, eg What's in a Name (this page lists 68!! "name" properties we currntly got), how to map Parent Places, etc
- improve the ontology and mapping editing workflow

Of course, each of these goals is up for discussion. 
But I strongly feel that working on the ontology in isolation from the mappings will not be productive.

IMHO the major problems are not with the ontology itself, but more in the mappings.
I've shared many weird and scary things, most are on https://github.com/dbpedia/mappings-tracker/issues?q= . 

E.g. how many defects can you find here? 
http://mappings.dbpedia.org/index.php/Mapping_el:Quote_box
No peeking in the Discussion tab :-)

Another quick quiz:
- what is vicePresident in DBO? What should it be a subproperty of?
- why the VicePresident class should be deleted?
- what's wrong with this mapping: 
  http://mappings.dbpedia.org/index.php?title=Mapping_pl:Polityk_infobox&action=edit

MAB> shared memory supercomputer and a large Hadoop appliance if anyone is interested
> in applying some new techniques for modeling the ontology.  I work for Cray.

Oh wow! 
I think what we need is a bunch of editors who know a bit about RDF, think clearly, and can spend time on editorial discussions and gardening.
Supercomputer powers won't help here (but superhuman powers might :-)

DK> DBpedia was a completely user-driven ontology and we plan to keep it that way.
> Now we will only set some editing workflow rules that will ensure a basic level of quality.

And hopefully educate the editors through discussions and gardening.
Thus far there's been very little discussion on the mapping wiki, which is the major problem..

In fact this thread proves it: why aren't we discussing the goals of that committee on the wiki ?
http://mappings.dbpedia.org/index.php?title=Talk:DBpedia_Ontology_Committee&action=edit&redlink=1

--

PFPS> Is there going to be the possibility of at least listening in remotely?

** Even if not: Peter, your contributions will be much appreciated! Even a quick diagnosis like this 
http://sourceforge.net/p/dbpedia/mailman/dbpedia-discussion/thread/BF52E5F8-889B-40A4-92DD-CF82C20BAEEF%40nuance.com/#msg32207963 
can drive a number of investigations by industrious editors.

PFPS> As well I would like to know what expressive power is being considered for the ontology language.
> For example, will disjointness axioms be allowed, or local ranges, or constraints?

IMHO we need to discuss what these constructs will be used for. 
E.g. Wikidata has a bunch of Constraint Violation reports.
E.g. see here the report for ULAN id:
https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P245
- Charles Bridges (Q5075775) and Charles Bridges (Q18641990) probably need to be merged in Wikidata.
- ULAN recors 500115493, 500000031 probably need to be merged in ULAN,
  if my surmising that Albrecht Dürer = Master of the Martyrdom of the Ten Thousand is right.

We should also compare against the capabilities of the extraction framework.
E.g. rdfs:range provides very useful hints, but they can't yet be used by the extraction framework
(since it can't map back from cooked to raw props to take them into account). Which leads to e.g.
- [[1940]] and [[13 май]] and [[Switzerland]] extracted as firstAccentPerson
- 42.697556 getting truncated to 42.8 before being converted to geo:lat for https://bg.wikipedia.org/wiki/София
- Places etc extracted as parents:
  select {?x dbpedia-owl:parent ?y filter not exists {?y a dbpedia-owl:Person}}

NT> How will it relate to other ontologies, taxonomies and schemas?

Basic things like DC, DCT, FOAF, BIBO should be used more than currently (but domain & range carefully checked).

> Also, will it relate to Wikidata, Wikipedia, schema.org, Facebook OG, etc.

Bigger mappings to external ontologies should appear in some future, but this is not so simple.
E.g. the current DBO<->Schema class mapping doesn't account for the different shape of the two hierarchies, with disastrous results.

MB> How will you address the problem that these changes could break existing applications?
> Will there be mappings from old to new? 
> Maybe an extended dump with the consequences of those mappings realized as extra triples?

These are very relevant questions, so changes should not be done willy-nilly.

But for many broken cases, I feel we shouldn't be hand-bound by considerations of "backward compatibility". E.g.

- if you have a fr.dbpedia app and use the prop path
   takePlace/sharingOut
to reach the parent of a place, see https://github.com/dbpedia/mappings-tracker/issues/29,
this will break when we replace it with isPartOf (as used in many other maps).

- if you have a bg.dbpedia app and use this to find females
  dbo:sex "a"  # that's a cyrillic "a"
this will break since we replaced it with dbo:gender dbr:Female

But is this good enough reason to keep emitting the old broken data?

Cheers!
--
Vladimir Alexiev, PhD, PMP
Lead, Data and Ontology Management Group
Ontotext Corp, www.ontotext.com
Sirma Group Holding, www.sirma.com
Email: vladimir.alexiev@ontotext.com, skype:valexiev1  
Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net
Landline: +359 (988) 106 084, Fax: +359 (2) 975 3226
Received on Sunday, 25 January 2015 18:41:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:57 UTC