Re: Linked Data Demand & Discussion Culture on this List, WAS: Introducing Semgel, a semantic database app for gathering & analyzing data from websites from Harish Kumar M. on 2012-07-22 (public-lod@w3.org from July 2012)

From: Harish Kumar M. <harish@semgel.com>
Date: Sun, 22 Jul 2012 09:56:14 +0530
To: Giovanni Tummarello <giovanni.tummarello@deri.org>, Sebastian Schaffert <sebastian.schaffert@salzburgresearch.at>, Dave Reynolds <dave.e.reynolds@gmail.com>, public-lod@w3.org
Message-ID: <CAN11D0=howy0yRnf11nG-ns0usM5DUEJcf+p4a6EuKOe3RX9Dw@mail.gmail.com>
Hi,

Thank you all for your observations on Semgel. I was really delighted to
see Sebastian taking it upon himself to articulate in some detail about how
Semgel aligns with the Linked Data vision. Much appreciated!

Its also been great to see some of the interesting thoughts and pointers
that have been shared in this thread. I would like to offer  (albeit with
the risk of rehashing prior discussions in this group) clarifications and
observations on a few points .

- The need for LinkedData consuming apps publishing Linked data URI's
(Kingsley's suggestion that served as a trigger for this thread!)
- Balancing idealism(ie dogma) and pragmatism(ie market-driven) in
realizing the vision of the Semantic web. (amplifying Bergman & Giovanni)
- The need for robust Linked Data Usecases which can logically be shown to
be superior to other/traditional approaches (amplifying Sebastian)

-----------
*Linked-Data consuming apps should publish Linked-date URI's*

First off, I want to clarify that I considered Kingsley's queries and
suggestions to be perfectly reasonable and did not perceive them in any way
to be negative. I just happened to disagree with him about priorities. And
if the cut and thrust of argument can lead to a discussion like this, we
don't have much to complain about!

Getting back to the point, Semgel's involvement with linked-data is a
strategic decision - its a leap of faith. So, in no way am I trying to
debate whether there is market of linked-data - after investing a bunch of
time and effort, I and most of us in this group are well past that point!

However, we would like our tactical decisions to be market-driven. I saw
Kingsley's suggestion that linked-data consuming apps too should publish
LinkedData URI's as something that should be market-driven.

Somewhere in the thread, Kingsley elegantly articulated the technical
rationale for doing this

... "the application ingests structured data but emits HTML pages (reports)
where the actual data keys (URIs) for the data are now dislocated from the
value chain? If you consume Linked Data there's no reason to obscure access
to those data sources in a solution. There are a number of best practice
patterns for keeping URIs accessible and discoverable to user agents"

How could the geek in me not agree with this! However, wearing the business
hat, I need to silence the geek and recognize that this cannot be a
priority when we are still trying to firmly establish a basic ecosystem of
linked-data publishing and consuming apps.

Kingsley reached out to me privately (very gracious of him!) and indicated
there is indeed a business case for Semgel to do this. I intend to engage
with him with a open mind to better understand his point of view.

----------
*Balancing idealism and pragmatism in realizing the vision of the Semantic
web.*

Semweb has always had more than its fair share of idealism and dogma
associated with it. However, at the risk of stating the obvious, we do need
to balance it with a appropriate amount of pragmatism. We just don't want
to go down the path of becoming "architectural astronauts"! (
http://bit.ly/bFnrDG)

When Bergman speaks about seeing "linked data as a useful and often
desirable technique, but not a means" and Giovanni bemoans the fact that "
features are neglected because they do not fit with the pure original
visions" and insists that "The community must honestly assess where
semantic technologies don't fit and on the other hand which features of the
semantic web  "stack" make some sense and bring value to the scenarios that
have (bring)economic value", I could not agree more!

We want to focus on the value we deliver, not on how we deliver it. A user
of the Semgel app for instance is never made aware of its semweb roots -
although some of them do wonder why some simple ops are sometimes so very
slow :)

Given Semgel's focus on linked-data consumption in general and UI in
particular, we have primarily drawn our inspiration from the work done by
the MIT/Simile folks. What makes them stand out for me is their pragmatism.
Exhibit, Potluck, Parallax and Refine all have pioneered fundamental ideas
without necessarily embracing the full semweb stack. This is what we would
like to emulate

We also have the brilliant sig.ma from Sindice (which does explicitly
expose the underlying uri's) and I am very much looking forward to
exploring Martynas's graphity (discovered through this thread!)

----------
*The need for robust Linked Data Usecases*

Sebastian wondered 'if we could collect even a small set of convincing
business cases and describe what problems they are solving and how, and
also what problems they encountered, I think it would help many of us".

Again, I couldn't agree more.

When we describe Semgel's architecture to geeks (who have not consumed the
semweb koolaid!), they can't help but wonder why we have chosen to perform
such elaborate acrobatics to build what is on the surface a relatively
straight-forward app. Mashing up data? Why cannot that be accomplished with
a few lines of python code, they ask!

The fact is that the one usecase that semweb really shines at is when there
is a need to
- *integrate*
- *small*, but
- *diverse* datasets (as in schema diversity)
- in a *adhoc* manner (as oppossed to pre-determined).
- for *generic* analysis

This will only come into play when there is a mature, global, distributed
data landscape. The unfortunate fact is that its going to be while before
we see this.

So the big question is - what intermediate problems can we take on while we
wait for this data paradise to emerge. I suppose this is where we need to
act on Sebastian's suggestion and begin to catalog linked-data usecases
which we can logically and rigorously show are superior to traditional
approaches. (we are not looking for usecases which simply demo how the tech
works).   We really need to show how we can excel on the Variety dimension
of Big Data.

If there are such lists, please do share them. If there are none, I would
love to collaborate with some of you to put something together.

Thanks,
Harish
http://semgel.com

ps : did any of you really read this response in its entirety? :)

On Sat, Jul 21, 2012 at 9:51 PM, Giovanni Tummarello <
giovanni.tummarello@deri.org> wrote:

> In the past months i have worked a lot on the commercialization of RDF
> basedknowledge technologies so i feel like giving a contribution.
>
> We tried to understand what could be of interest to enterprise and
> came up with the slogan - or lets say adopted -  "enterprise linked
> data clouds" with an internally matured understanding of what this
> means and how it deliver value.
>
> In our experience, Linked Data that can be of interest to enterprise
> cannot be further away from so many of the things that have been
> preached and pushed with prominence (i'll mention a few things like
> 303s,  "follow your nose" even  "resolvable data uris",  "sameAs" , "5
> star data publishing" , vocabolary x y that was never used outside
> demos... insert here so much more ).
>
> Similary is very far away from saying 'replace your existing running
> system with anything RDF based'. Wont even speak about preaching the
> value of publishin data as "lod".
>
> To find value that can be sold i'd go back to the basic a bit.
>
>  RDF is very nice at Knowledge Representation.  Matter of fact might
> be the most solid industrial tool there is for this. Great way to
> serialize knowledge with properties attached to the data, great way to
> merge, great way to ship it to others (and hope they'll understand it)
> thanks to shared URIs of properties.  A mature query language.
>
> Ok so where does this come into use SPECIFICALLY? (that is you can
> demonstrate superiority vs other existing technologies)
>
> I'd say only in environments/use cases/ business sectors  where
>
> * knowledge can come from many sources, AND
> * new sources popping up all the time,  AND
> *  sources which are complex, might have a lot of rich descriptions,
> * time to explore and understand them is limited,
> * AND of course sufficient SCALE of the operation/business to support
> the development/ have time to learn and understand this etc.
>
> The first sectors that come to mind with these needs are (at least
> come to mind to me) pharmaceutical, defense-military, scientific
> technical publishing.  (they're the first that come to mind given that
> in my ownlittle personal experience these are the sector that 'came to
> us' and really didnt need pitching or just minimal)
>
> One can say that, looking well, a lot of others, potentially, in the
> future might have similar need.
>
> True.. but they might when you put another elements into this: data
> scale (bigdata)  and robustness AND (given the last point of the
> previous list which is) enterprise strenght credibility.
>
> Here we as a community, IMO have not been shining:.
>
> * bigdata - just not there. Sorry but "publishing" a big data set as
> in LOD doesnt count as a difficult data operation to do. Semantic
> technologies have notoriously been proposed by "academics" with very
> often not even the slightest notion of what traditional data
> processing systems do, even a basic RDBMS. Get the names of the
> peoplewho have published and have been incensed on semantic web and
> intersect that with that of conferences that matter to industry (and
> the world)
>
> * robustness - all systems have been shaky at best again due to being
> too often just trow away prototypes (when coming from academia). In
> other cases companies venturing into this field have been way too much
> distracted/ pressured/ (and finally got self convinced) into
> implementing and caring about features (see all those mentioned above
> and more)  that were unrequested to begin with, and which value was
> just based on a conjecture.
>
> * missing obvious features. Other features were neglected becouse "not
> fitting with the pure originalvisions" why restricting ourself to
> triples? quads or quintuples for example make so much sense but oh my
> god what would the community have said. And now systems that have
> these features e.g. certain graph sstores are the obvious choices in
> certain cases.
>
> Somebody mentioned "Garlik" as a success story earlier. They got this
> right, but by concentrating on thigs that made sense for industry
> (their industry) with minimal features that were needed (their 5store
> - the production large scale data processing triplestore really
> implements just a bare subsset of sparql, they reason only with some
> simple rules etc) but done with proper engineering.
>
> So my conclusion in short.
>
> There are, in our opinion and analysis,  reasons why semantic data
> technologies/ large scale knowledge representation have a lot to give
> to society. However to have credibility have some result, the
> "community" must get humble , look at what's happening in the real
> world of data integration and big data.
> The community must honestly assess where semantic technologies don't
> fit and on the other hand which features of the semantic web  "stack"
> make some sense and bring value to the scenarios that have (bring)
> economic value)
>
> Gio
>
>
>
>
> On Sat, Jul 21, 2012 at 1:05 AM, Sebastian Schaffert
> <sebastian.schaffert@salzburgresearch.at> wrote:
> > Hi Dave,
> >
> > comments inline. :)
> >
> > Am 20.07.2012 um 23:25 schrieb Dave Reynolds:
> >
> >> Hi Sebastian,
> >>
> >> I completely agree with what you say about:
> >>  o Harish's original post being relevant to linked data and this list
> >>  o that the culture of this forum can be counter productive
> >>  o that the evidence for linked data delivering business value needs
> >>    to be a lot stronger
> >>
> >> However, just to balance the picture slightly ...
> >>
> >> There are *some* clear, well documented examples of semweb/RDF/LD
> delivering business value through data integration. The most famous of
> these being probably: Garlik (now Experian), Amdocs and arguably the BBC.
> In my experience for every publicised example there are several non-public
> or at least less visible examples of companies quietly using the technology
> internally while not shouting about it. I've come across examples in
> banking, publishing, travel and health care - at different levels of
> maturity.
> >
> > Yes, for me these are all great results. However, the problem for me is
> convincing other industries, and the toughest question I am always faced
> with is "and why could I not solve the issue with established technology
> XYZ, which my engineers already know?". As long as we cannot answer this
> question, it will not be easy.
> >
> >
> >>
> >> Not saying the business value story is perfectly articulated or the
> evidence is watertight, but it's not totally absent :)
> >>
> >> While it's not your main point, I would also say we have reasonable
> arguments for the value of linked data over just CSVs for publishing
> government statistics and measurement data. The benefits include safer use
> of data because it's self-describing (e.g. units!), ability to slice and
> dice through API calls making it easier to build apps, ability to address
> the data and thus annotate it and reference it. The more advanced
> government departments approach this as "publish once, use many". One
> pipeline that lets people access the data as dumps, through REST APIs, as
> Linked Data or via apps - all powered by a shared Linked Data
> infra-structure. It's not CSV or Linked Data it's CSV *and* Linked Data.
> >
> > Yes. It was actually not really an argument from my side, I just wanted
> to point out the kind of discussions I face with people out there. I
> totally agree with what you say.
> >
> > Greetings,
> >
> > Sebastian
> > --
> > | Dr. Sebastian Schaffert
> sebastian.schaffert@salzburgresearch.at
> > | Salzburg Research Forschungsgesellschaft
> http://www.salzburgresearch.at
> > | Head of Knowledge and Media Technologies Group          +43 662 2288
> 423
> > | Jakob-Haringer Strasse 5/II
> > | A-5020 Salzburg
> >
>
>
Received on Sunday, 22 July 2012 04:26:44 UTC