Re: Semantic web vs big data / ML

I wouldn't say they are in competition at all.  I'd say the approaches
are complementary.  You can use machine learning / AI techniques to,
as you say, derive assertions from unstructured data.  Once you have
those assertions, you can work with them using the normal SemWeb stack
as makes sense for the use-case in question.

I'd start by looking at Apache Stanbol[1], which is a system for
extracting triples from unstructured data. The built-in engines are
fairly simplistic, but you can extend it by creating your own engines
- and there you could build an engine based on Deep Learning, or any
other technique you prefer.  You might also find some use in the Any23
project[2] which seems to cover some similar ground.

As for probabilistic reasoning on top of triples.... you could, of
course, always hack up your own scheme by adding additional assertions
to the KB with your probabilistic weights, and have code that reads
those and does whatever kind of reasoning you want.  But there has
been *some* work on integrating probabilistic reasoning into the
SemWeb stack in a standard way.  Check out the PROWL (Probabilistic
OWL) project[3].

[1]: https://stanbol.apache.org

[2]: https://any23.apache.org/

[3] http://www.pr-owl.org/


HTH.


Phil

This message optimized for indexing by NSA PRISM


On Fri, Aug 3, 2018 at 7:30 AM, John Leonard
<john.leonard@incisivemedia.com> wrote:
> Can someone please fill me in on the latest state of play between the
> symantec web and using machine learning techniques with massive unstructured
> datasets to derive probablistic links between data items. Are the two
> techniques in competition? Are they compatible? Or is it more a case of
> horses for courses? I refer to this 2009 paper
> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
> The Unreasonable Effectiveness of Data by Norvig et al.
> Thanks for any pointers
>
>

Received on Friday, 3 August 2018 17:56:47 UTC