Re: Semantic web vs big data / ML

This is a very good question. I agree with other responses that ML and the
Semantic Web technologies are not in competition, they do very different
things. The question is whether and how they can work together in concert.
ML is used in a lot of NLP which is, in turn, used to extract triples from
text.  The state of the art here is improving all the time, but it is still
not that great. A common approach is for a NLP engine to take as input both
an ontology and a text document and to extract triples using the classes
and properties in the ontology. This works best if the ontology has a lot
of text descriptions of the classes and properties. I'm not sure how much
these triple-extractors make use of the ontology axioms, probably they use
the class hierarchy, and property domains and ranges.

Another interesting possibility, which I have not seen much written is
using ontology vocabulary to express the features that are learned.  I'm
not  an ML person, but in ML they talk about models, which to some extent
describe the kinds of things in the subject area and their
attributes/features.  Probably someone has looked into this.

Probably the most common use of ML is to put things into pre-determined
categories. But its just statistics. There is no human-grokkable way to
explain why something goes into one category vs. another.  It would be nice
if there was a way to do that, in terms of an ontology classes and
properties. Don't know if that is being done.

Another possible link up is where ML is used to do automatic creation of
categories. Humans can look at the categories and give them meaningful
names, but to the computer they are just the result of statistics, they
have no meaning. It could be that meaning of these categories could be
inferred and matched against an ontology. General topics would probably
start with the DBpedia ontology.

Way back in 2005 there was a Dagstuhl workshop on ML and the Semantic Web
<https://www.dagstuhl.de/05071>; but there is not a lot of documentation on
that event.

Michael






On Fri, Aug 3, 2018 at 4:30 AM, John Leonard <john.leonard@incisivemedia.com
> wrote:

> Can someone please fill me in on the latest state of play between the
> symantec web and using machine learning techniques with massive
> unstructured datasets to derive probablistic links between data items. Are
> the two techniques in competition? Are they compatible? Or is it more a
> case of horses for courses? I refer to this 2009 paper https://static.
> googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
> The Unreasonable Effectiveness of Data by Norvig et al.
> Thanks for any pointers
>
> <john.leonard@incisivemedia.com>
>



-- 

Michael Uschold
   Senior Ontology Consultant, Semantic Arts
   http://www.semanticarts.com
   LinkedIn: www.linkedin.com/in/michaeluschold
   Skype, Twitter: UscholdM

Received on Friday, 3 August 2018 22:58:39 UTC