Re: Linked Data Demand & Discussion Culture on this List, WAS: Introducing Semgel, a semantic database app for gathering & analyzing data from websites from Giovanni Tummarello on 2012-07-21 (public-lod@w3.org from July 2012)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sat, 21 Jul 2012 18:21:23 +0200
To: Sebastian Schaffert <sebastian.schaffert@salzburgresearch.at>
Cc: Dave Reynolds <dave.e.reynolds@gmail.com>, public-lod@w3.org
Message-ID: <CAHHRs7jJghFSHrFbebt-YRt2FE+5ypPea1QPfbCdcHFqTp9u3A@mail.gmail.com>
In the past months i have worked a lot on the commercialization of RDF
basedknowledge technologies so i feel like giving a contribution.

We tried to understand what could be of interest to enterprise and
came up with the slogan - or lets say adopted -  "enterprise linked
data clouds" with an internally matured understanding of what this
means and how it deliver value.

In our experience, Linked Data that can be of interest to enterprise
cannot be further away from so many of the things that have been
preached and pushed with prominence (i'll mention a few things like
303s,  "follow your nose" even  "resolvable data uris",  "sameAs" , "5
star data publishing" , vocabolary x y that was never used outside
demos... insert here so much more ).

Similary is very far away from saying 'replace your existing running
system with anything RDF based'. Wont even speak about preaching the
value of publishin data as "lod".

To find value that can be sold i'd go back to the basic a bit.

 RDF is very nice at Knowledge Representation.  Matter of fact might
be the most solid industrial tool there is for this. Great way to
serialize knowledge with properties attached to the data, great way to
merge, great way to ship it to others (and hope they'll understand it)
thanks to shared URIs of properties.  A mature query language.

Ok so where does this come into use SPECIFICALLY? (that is you can
demonstrate superiority vs other existing technologies)

I'd say only in environments/use cases/ business sectors  where

* knowledge can come from many sources, AND
* new sources popping up all the time,  AND
*  sources which are complex, might have a lot of rich descriptions,
* time to explore and understand them is limited,
* AND of course sufficient SCALE of the operation/business to support
the development/ have time to learn and understand this etc.

The first sectors that come to mind with these needs are (at least
come to mind to me) pharmaceutical, defense-military, scientific
technical publishing.  (they're the first that come to mind given that
in my ownlittle personal experience these are the sector that 'came to
us' and really didnt need pitching or just minimal)

One can say that, looking well, a lot of others, potentially, in the
future might have similar need.

True.. but they might when you put another elements into this: data
scale (bigdata)  and robustness AND (given the last point of the
previous list which is) enterprise strenght credibility.

Here we as a community, IMO have not been shining:.

* bigdata - just not there. Sorry but "publishing" a big data set as
in LOD doesnt count as a difficult data operation to do. Semantic
technologies have notoriously been proposed by "academics" with very
often not even the slightest notion of what traditional data
processing systems do, even a basic RDBMS. Get the names of the
peoplewho have published and have been incensed on semantic web and
intersect that with that of conferences that matter to industry (and
the world)

* robustness - all systems have been shaky at best again due to being
too often just trow away prototypes (when coming from academia). In
other cases companies venturing into this field have been way too much
distracted/ pressured/ (and finally got self convinced) into
implementing and caring about features (see all those mentioned above
and more)  that were unrequested to begin with, and which value was
just based on a conjecture.

* missing obvious features. Other features were neglected becouse "not
fitting with the pure originalvisions" why restricting ourself to
triples? quads or quintuples for example make so much sense but oh my
god what would the community have said. And now systems that have
these features e.g. certain graph sstores are the obvious choices in
certain cases.

Somebody mentioned "Garlik" as a success story earlier. They got this
right, but by concentrating on thigs that made sense for industry
(their industry) with minimal features that were needed (their 5store
- the production large scale data processing triplestore really
implements just a bare subsset of sparql, they reason only with some
simple rules etc) but done with proper engineering.

So my conclusion in short.

There are, in our opinion and analysis,  reasons why semantic data
technologies/ large scale knowledge representation have a lot to give
to society. However to have credibility have some result, the
"community" must get humble , look at what's happening in the real
world of data integration and big data.
The community must honestly assess where semantic technologies don't
fit and on the other hand which features of the semantic web  "stack"
make some sense and bring value to the scenarios that have (bring)
economic value)

Gio




On Sat, Jul 21, 2012 at 1:05 AM, Sebastian Schaffert
<sebastian.schaffert@salzburgresearch.at> wrote:
> Hi Dave,
>
> comments inline. :)
>
> Am 20.07.2012 um 23:25 schrieb Dave Reynolds:
>
>> Hi Sebastian,
>>
>> I completely agree with what you say about:
>>  o Harish's original post being relevant to linked data and this list
>>  o that the culture of this forum can be counter productive
>>  o that the evidence for linked data delivering business value needs
>>    to be a lot stronger
>>
>> However, just to balance the picture slightly ...
>>
>> There are *some* clear, well documented examples of semweb/RDF/LD delivering business value through data integration. The most famous of these being probably: Garlik (now Experian), Amdocs and arguably the BBC. In my experience for every publicised example there are several non-public or at least less visible examples of companies quietly using the technology internally while not shouting about it. I've come across examples in banking, publishing, travel and health care - at different levels of maturity.
>
> Yes, for me these are all great results. However, the problem for me is convincing other industries, and the toughest question I am always faced with is "and why could I not solve the issue with established technology XYZ, which my engineers already know?". As long as we cannot answer this question, it will not be easy.
>
>
>>
>> Not saying the business value story is perfectly articulated or the evidence is watertight, but it's not totally absent :)
>>
>> While it's not your main point, I would also say we have reasonable arguments for the value of linked data over just CSVs for publishing government statistics and measurement data. The benefits include safer use of data because it's self-describing (e.g. units!), ability to slice and dice through API calls making it easier to build apps, ability to address the data and thus annotate it and reference it. The more advanced government departments approach this as "publish once, use many". One pipeline that lets people access the data as dumps, through REST APIs, as Linked Data or via apps - all powered by a shared Linked Data infra-structure. It's not CSV or Linked Data it's CSV *and* Linked Data.
>
> Yes. It was actually not really an argument from my side, I just wanted to point out the kind of discussions I face with people out there. I totally agree with what you say.
>
> Greetings,
>
> Sebastian
> --
> | Dr. Sebastian Schaffert          sebastian.schaffert@salzburgresearch.at
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
>
Received on Saturday, 21 July 2012 16:22:35 UTC