- From: Paul Houle <ontology2@gmail.com>
- Date: Fri, 18 Sep 2015 11:15:46 -0400
- To: John Flynn <jflynn12@verizon.net>
- Cc: Marco Fossati <hell.j.fox@gmail.com>, Magnus Knuth <magnus.knuth@hpi.uni-potsdam.de>, Linked Data community <public-lod@w3.org>, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
- Message-ID: <CAE__kdQEQqDAxAXU6R+_2+7D-U8qiV4Jaq3-Sq4D1Q3E+YFDtA@mail.gmail.com>
It's an interesting point which gets very close to the issue of "what is the market for Linked Data?" I think of the Matthew Sweet song "Love is like Oxygen" and what it is that "Oxygen" refers to here, and really there are a lot of entities here: * The common diatomic gas (so far as the song goes: you can get "high" from hypoxia and a few weeks of breathing 100% O2 at atmospheric pressure will destroy your lungs) * Oxygen is classified as a medical treatment in many generic databases, which might be :sameAs the diatomic gas, but generally is not part of the lattice for other chemical elements -- say Tellurium or Polonium just to look at the same row of the periodic table * There are a number of allotropes of oxygen: ozone is the most familiar but there is a kind of red sludge you get at high pressures, atomic hydrogen in space, etc. * There is the chemical element, which confounds "Open Mind Commonsense" -- not only does "the atmosphere contain oxygen" but the overwhelming fraction of the mass of the oceans is oxygen nuclei as well as close to half of the mass of the lithosphere -- compare that to the 1/5 of the atmosphere that is O2. * Of course there are numerous stable and unstable isotopes of oxygen * Then there are various electronic states such as triplet and singlet states of the diatomic oxygen molecule * All of these things permute (Isotope 17& Isotope 18in a triplet state, etc.) If you look at what is there in DBpedia you find there are "concepts" such as https://en.wikipedia.org/wiki/Isotopes_of_oxygen Not to mention all of the confusers like the XML Editor, Oprah's TV channel, etc. When it comes down to it, nobody actually needs a knowledge base that can accurately about everything from spectroscopy to medical billing, but people do need knowledge bases that do specific things. Thus when it comes to things like DBTax, the exploration of a large knowledge base is just one scenario. Another one is to use a tool like DBTax to develop a classification which is relevant to some particular domain. A few I have been involved with are: * organizing DBpedia topics into categories relevant to media and marketing such as "sports", "science", "celebrities", etc. Of course you run into strange things (Alyssa Milano ends up in almost all the categories) so the refinement of this database is a matter of addressing particular pain points, howlers, etc. * creating specific categories such as 'things in New York City that could be photographable'; in my mind the essential act of classification is binary -- for a given instance it is in the category or not in the category. The creation of this category involved merging a few things such as (1) geographic coordinates, (2) words in the title, (3) categories, (4) link network analysis. The weasel words "could be photographable" are important to having an operational definition, because it means that you don't waste time looking for strange abstract things, because ultimately if you can't find photographs of the thing you not going to include it in the collection. Anyway, DBTax is interesting not so much as itself but as a parameterizable family of things that can create the classifications that various people need. On Thu, Sep 17, 2015 at 7:25 PM, John Flynn <jflynn12@verizon.net> wrote: > I guess this is a "point-of-view" comment, but attempting to assign > "correct" types to entities seems upside-down. An ontology, consisting of > specific classes, subclasses, properties, subproperties plus the specific > relationships between these should describe a specific domain of interest. > Once the domain of interest ontology is created, then the process of > identifying and assigning entities/instances that belong within that domain > of interest can begin. If the ontology is properly designed it should be > very clear which entities fit within that domain of interest as well as > where they fit. > > John Flynn > http://semanticsimulations.com > > -----Original Message----- > From: Marco Fossati [mailto:hell.j.fox@gmail.com] > Sent: Thursday, September 17, 2015 11:26 AM > To: Magnus Knuth > Cc: public-lod@w3.org; dbpedia-discussion > Subject: Re: [Dbpedia-discussion] DBtax questions > > Hi Magnus and thanks for your interest, > > Generally speaking, the challenge of assigning "correct" types to entities > is always a highly subjective task. > From a strictly linguistic point of view, a classification taxonomy is > itself a very debatable way to describe the semantics of content expressed > in natural language: one should always keep in mind contextual pieces of > information to deeply understand the sense of e.g., some Wikipedia article. > > Said that, the main goal of DBTax is to assign as many types as possible, > provided that they are different from owl#Thing. > In this way, we can cluster entities with more meaningful types and query > the knowledge base accordingly. > > Of course, you can say that owl#Thing has 100% coverage, but does it make > sense? > The claimed 99% stems instead from a *set* of more specific types. > Then high recall comes with a precision cost. > > On 9/17/15 4:04 PM, Magnus Knuth wrote: > > One structural problem I recognized when seeing the approach [ > http://jens-lehmann.org/files/2015/semantics_dbtax.pdf], is that there is > in most (non-complex) categories an article having exactly the same name, > e.g. dbr:President dc:subject dbc:President. And indeed these resources are > typed accordingly, e.g. http://it.dbpedia.org/resource/Presidente is a > dbtax:President and http://it.dbpedia.org/resource/Pagoda is dbtax:Pagoda. > That is obvious for a human, but is it the same for an algorithm? :-) > > > > A type coverage of more than 99 percent is very suspicious, because I’d > expect much more resources in DBpedia not type-able. Why? A lot of articles > in DBpedia describe very abstract concepts, e.g. Liberty, Nationality, > Social_inequality (well, you have the class dbtax:Concept, but what is on > the other hand not a concept?), or they describe classes by their selves, > e.g. President, Country, Person, Plane (well, you have the class > dbtax:Classification, but it is not used as such [ > http://it.dbpedia.org/sparql?default-graph-uri=&query=SELECT+*+%7B%3Fres+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fdbtax%2FClassification%3E%7D&format=text%2Fhtml&debug=on]). > For some articles it is arguable whether they are instance or class, e.g. > Volkswagen_Polo, Horse. > > > > I see that the classes you extracted are truly valuable for enriching > the DBpedia ontology, but it obviously needs some tidy up and disambiguate > efforts. > I completely agree: I think we should merge DBTax into the DBpedia > ontology mappings wiki to do so. > BTW, DBTax overlaps with the DBpedia ontology by more than 20%. > > Cheers! > > > > > ------------------------------------------------------------------------------ > Monitor Your Dynamic Infrastructure at Any Scale With Datadog! > Get real-time metrics from all of your servers, apps and tools > in one place. > SourceForge users - Click here to start your Free Trial of Datadog now! > http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2@gmail.com :BaseKB -- Query Freebase Data With SPARQL http://basekb.com/gold/ Legal Entity Identifier Lookup https://legalentityidentifier.info/lei/lookup/ <http://legalentityidentifier.info/lei/lookup/> Join our Data Lakes group on LinkedIn https://www.linkedin.com/grp/home?gid=8267275
Received on Friday, 18 September 2015 15:16:15 UTC