Re: Semantic web vs big data / ML from John Leonard on 2018-08-03 (semantic-web@w3.org from August 2018)

From: John Leonard <john.leonard@incisivemedia.com>
Date: Fri, 3 Aug 2018 20:03:55 +0000
To: Hugh Glaser <hugh@glasers.org>, Phillip Rhodes <motley.crue.fan@gmail.com>
CC: "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <DB3PR06MB0843204A1FED0BD7B0DDC15A87230@DB3PR06MB0843.eurprd06.prod.outlook.com>

Thanks Hugh and Phillip,


I can feel more questions coming on, but they are as yet unformed so I'll let them lie for now. Will read the linked articles over the weekend.


Cheers


________________________________
From: Hugh Glaser <hugh@glasers.org>
Sent: 03 August 2018 20:47
To: Phillip Rhodes
Cc: John Leonard; semantic-web@w3.org
Subject: Re: Semantic web vs big data / ML

I think I'll stick my neck out on a Friday night...

Yes. Quite right.
A challenge for ML etc. is how to unbundle the knowledge extraction from the applications that use and consume the knowledge, and re-use it, hopefully.
This is what the Semantic Web and its technologies excels at.
So I think of ML, NLP, etc as tools that generate knowledge that is the fodder of the Semantic Web.
The Semantic Web makes no pretence at processing unstructured data - it is all about structure.
But once you have made that step, SW is a (the?) tool to enable the knowledge to be enriched (from multiple sources in particular), deepened (by further inference and rules), and ultimately used, and perhaps most importantly re-used.

It is the glue and protocol without which knowledge lives in its silos, and ages and dies with the applications.

Have a good weekend!


> On 3 Aug 2018, at 15:34, Phillip Rhodes <motley.crue.fan@gmail.com> wrote:
>
> I wouldn't say they are in competition at all.  I'd say the approaches
> are complementary.  You can use machine learning / AI techniques to,
> as you say, derive assertions from unstructured data.  Once you have
> those assertions, you can work with them using the normal SemWeb stack
> as makes sense for the use-case in question.
>
> I'd start by looking at Apache Stanbol[1], which is a system for
> extracting triples from unstructured data. The built-in engines are
> fairly simplistic, but you can extend it by creating your own engines
> - and there you could build an engine based on Deep Learning, or any
> other technique you prefer.  You might also find some use in the Any23
> project[2] which seems to cover some similar ground.
>
> As for probabilistic reasoning on top of triples.... you could, of
> course, always hack up your own scheme by adding additional assertions
> to the KB with your probabilistic weights, and have code that reads
> those and does whatever kind of reasoning you want.  But there has
> been *some* work on integrating probabilistic reasoning into the
> SemWeb stack in a standard way.  Check out the PROWL (Probabilistic
> OWL) project[3].
>
> [1]: https://stanbol.apache.org
Apache Stanbol - Welcome to Apache Stanbol!<https://stanbol.apache.org/>
stanbol.apache.org
In order to be used as a semantic engine via its services, all components offer their functionalities in terms of a RESTful web service API. Apache Stanbol's main features are:


>
> [2]: https://any23.apache.org/
>
> [3] http://www.pr-owl.org/
>
>
> HTH.
>
>
> Phil
>
> This message optimized for indexing by NSA PRISM
>
>
> On Fri, Aug 3, 2018 at 7:30 AM, John Leonard
> <john.leonard@incisivemedia.com> wrote:
>> Can someone please fill me in on the latest state of play between the
>> symantec web and using machine learning techniques with massive unstructured
>> datasets to derive probablistic links between data items. Are the two
>> techniques in competition? Are they compatible? Or is it more a case of
>> horses for courses? I refer to this 2009 paper
>> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
>> The Unreasonable Effectiveness of Data by Norvig et al.
>> Thanks for any pointers
>>
>>
>
>

--
Hugh
023 8061 5652

Received on Friday, 3 August 2018 20:04:27 UTC