Re: Semantic web vs big data / ML

I think I'll stick my neck out on a Friday night...

Yes. Quite right.
A challenge for ML etc. is how to unbundle the knowledge extraction from the applications that use and consume the knowledge, and re-use it, hopefully.
This is what the Semantic Web and its technologies excels at.
So I think of ML, NLP, etc as tools that generate knowledge that is the fodder of the Semantic Web.
The Semantic Web makes no pretence at processing unstructured data - it is all about structure.
But once you have made that step, SW is a (the?) tool to enable the knowledge to be enriched (from multiple sources in particular), deepened (by further inference and rules), and ultimately used, and perhaps most importantly re-used.

It is the glue and protocol without which knowledge lives in its silos, and ages and dies with the applications.

Have a good weekend!


> On 3 Aug 2018, at 15:34, Phillip Rhodes <motley.crue.fan@gmail.com> wrote:
> 
> I wouldn't say they are in competition at all.  I'd say the approaches
> are complementary.  You can use machine learning / AI techniques to,
> as you say, derive assertions from unstructured data.  Once you have
> those assertions, you can work with them using the normal SemWeb stack
> as makes sense for the use-case in question.
> 
> I'd start by looking at Apache Stanbol[1], which is a system for
> extracting triples from unstructured data. The built-in engines are
> fairly simplistic, but you can extend it by creating your own engines
> - and there you could build an engine based on Deep Learning, or any
> other technique you prefer.  You might also find some use in the Any23
> project[2] which seems to cover some similar ground.
> 
> As for probabilistic reasoning on top of triples.... you could, of
> course, always hack up your own scheme by adding additional assertions
> to the KB with your probabilistic weights, and have code that reads
> those and does whatever kind of reasoning you want.  But there has
> been *some* work on integrating probabilistic reasoning into the
> SemWeb stack in a standard way.  Check out the PROWL (Probabilistic
> OWL) project[3].
> 
> [1]: https://stanbol.apache.org
> 
> [2]: https://any23.apache.org/
> 
> [3] http://www.pr-owl.org/
> 
> 
> HTH.
> 
> 
> Phil
> 
> This message optimized for indexing by NSA PRISM
> 
> 
> On Fri, Aug 3, 2018 at 7:30 AM, John Leonard
> <john.leonard@incisivemedia.com> wrote:
>> Can someone please fill me in on the latest state of play between the
>> symantec web and using machine learning techniques with massive unstructured
>> datasets to derive probablistic links between data items. Are the two
>> techniques in competition? Are they compatible? Or is it more a case of
>> horses for courses? I refer to this 2009 paper
>> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
>> The Unreasonable Effectiveness of Data by Norvig et al.
>> Thanks for any pointers
>> 
>> 
> 
> 

-- 
Hugh
023 8061 5652

Received on Friday, 3 August 2018 19:48:22 UTC