- From: John Leonard <john.leonard@incisivemedia.com>
- Date: Fri, 3 Aug 2018 20:03:55 +0000
- To: Hugh Glaser <hugh@glasers.org>, Phillip Rhodes <motley.crue.fan@gmail.com>
- CC: "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <DB3PR06MB0843204A1FED0BD7B0DDC15A87230@DB3PR06MB0843.eurprd06.prod.outlook.com>
Thanks Hugh and Phillip, I can feel more questions coming on, but they are as yet unformed so I'll let them lie for now. Will read the linked articles over the weekend. Cheers ________________________________ From: Hugh Glaser <hugh@glasers.org> Sent: 03 August 2018 20:47 To: Phillip Rhodes Cc: John Leonard; semantic-web@w3.org Subject: Re: Semantic web vs big data / ML I think I'll stick my neck out on a Friday night... Yes. Quite right. A challenge for ML etc. is how to unbundle the knowledge extraction from the applications that use and consume the knowledge, and re-use it, hopefully. This is what the Semantic Web and its technologies excels at. So I think of ML, NLP, etc as tools that generate knowledge that is the fodder of the Semantic Web. The Semantic Web makes no pretence at processing unstructured data - it is all about structure. But once you have made that step, SW is a (the?) tool to enable the knowledge to be enriched (from multiple sources in particular), deepened (by further inference and rules), and ultimately used, and perhaps most importantly re-used. It is the glue and protocol without which knowledge lives in its silos, and ages and dies with the applications. Have a good weekend! > On 3 Aug 2018, at 15:34, Phillip Rhodes <motley.crue.fan@gmail.com> wrote: > > I wouldn't say they are in competition at all. I'd say the approaches > are complementary. You can use machine learning / AI techniques to, > as you say, derive assertions from unstructured data. Once you have > those assertions, you can work with them using the normal SemWeb stack > as makes sense for the use-case in question. > > I'd start by looking at Apache Stanbol[1], which is a system for > extracting triples from unstructured data. The built-in engines are > fairly simplistic, but you can extend it by creating your own engines > - and there you could build an engine based on Deep Learning, or any > other technique you prefer. You might also find some use in the Any23 > project[2] which seems to cover some similar ground. > > As for probabilistic reasoning on top of triples.... you could, of > course, always hack up your own scheme by adding additional assertions > to the KB with your probabilistic weights, and have code that reads > those and does whatever kind of reasoning you want. But there has > been *some* work on integrating probabilistic reasoning into the > SemWeb stack in a standard way. Check out the PROWL (Probabilistic > OWL) project[3]. > > [1]: https://stanbol.apache.org Apache Stanbol - Welcome to Apache Stanbol!<https://stanbol.apache.org/> stanbol.apache.org In order to be used as a semantic engine via its services, all components offer their functionalities in terms of a RESTful web service API. Apache Stanbol's main features are: > > [2]: https://any23.apache.org/ > > [3] http://www.pr-owl.org/ > > > HTH. > > > Phil > > This message optimized for indexing by NSA PRISM > > > On Fri, Aug 3, 2018 at 7:30 AM, John Leonard > <john.leonard@incisivemedia.com> wrote: >> Can someone please fill me in on the latest state of play between the >> symantec web and using machine learning techniques with massive unstructured >> datasets to derive probablistic links between data items. Are the two >> techniques in competition? Are they compatible? Or is it more a case of >> horses for courses? I refer to this 2009 paper >> https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf >> The Unreasonable Effectiveness of Data by Norvig et al. >> Thanks for any pointers >> >> > > -- Hugh 023 8061 5652
Received on Friday, 3 August 2018 20:04:27 UTC