Developers don't use the Semantic Web because they shouldn't

I've expressed this opinion before in other venues, and it's gone over like a lead balloon, so why not again? :grin:

The "middle third" of developers don't generally use SemWeb technologies for the same reason that the "upper third" and "lower third" don't; they have no reason whatsoever to do so.

SemWeb technologies show their strength when crossing boundaries (between disciplines, between organizations, even between technical stacks or individual data sources). Most developers don't do that for a living. They work within relatively tightly-focussed areas, like building a single app for mobile phones that works off a single API, or a website that caters to one organization's users, or a management system for one business unit. RDF tooling delivers no value to such teams and costs a fortune compared with simpler approaches. Why would they use it? They shouldn't!

On this view, technical changes like bnodes for predicates or better support for list constructs aren't to the purpose. (Whether or not they are good ideas on other grounds is a different question, of course.) But to my eye this view does disclose (at least) two potential avenues towards real change:

• I know of little OLAP work that is currently done with open semantic technologies, although OLAP frequently brings together multiple sources of data and the kinds of queries that people use for that work could benefit enormously from semantic lifting. It seems to me that that could change, if the perception of poor performance and intractable constructions changed. (I'm not making any argument about the _actual_ performance of semantic web tooling, which is of course a complex question that I have rarely heard discussed usefully without specific examples. The perception, however, is pretty clearly pretty awful.) This could mean work to clarify and publicize the real potential for performance, and to improve it.

• I believe that semantic technologies might really benefit so-called "data lake" approaches in which data is quickly ingested and indexed without normalization and then transformations are applied more-or-less dynamically to query or process different sections of data together. Again, the common factor is the need to bring together disparate data sources and the immediate obstacle (or at least, _an_ immediate obstacle) is perceived performance.

To be clear, I'm in no way opposed to technical improvements! (If nothing else, as a committer for Apache Jena, I'm excited to make our own work easier and to make it easier to involve and excite others.) And as someone who (substantially) makes his living applying linked data ideas for cultural heritage and scientific research, I want these ideas to spread widely! 

I see some pretty hopeful developments, like technologies that make it easer to use semantic tech in "big data" settings be they open [1] or as a service [2] or the beginnings of work on using the power of statistical methods for semantic lifting [3].

All is all, my claim is that working to get a great bulk of developers using semantic tech may not the right problem to work on. Working to get the much smaller number of developers with really on-point needs using (or able to use) semantic tech  is a better task, and one for which this community is truly fitted.

---
Adam Soroka
Research Computing : Office of the CIO : the Smithsonian Institution

[1] http://sansa-stack.net/
[2] https://aws.amazon.com/neptune/
[3] http://www.semantic-web-journal.net/content/machine-learning-internet-things-semantic-enhanced-approach-1

Received on Thursday, 22 November 2018 16:17:41 UTC