Re: Linked Data Demand & Discussion Culture on this List, WAS: Introducing Semgel, a semantic database app for gathering & analyzing data from websites from Martynas Jusevicius on 2012-07-22 (public-lod@w3.org from July 2012)

From: Martynas Jusevicius <martynas@graphity.org>
Date: Sun, 22 Jul 2012 11:45:58 +0200
To: harish@semgel.com
Cc: Giovanni Tummarello <giovanni.tummarello@deri.org>, Sebastian Schaffert <sebastian.schaffert@salzburgresearch.at>, Dave Reynolds <dave.e.reynolds@gmail.com>, public-lod@w3.org
Message-ID: <CAE35Vmzzpqp=thWT4bzjqboPUKYKN+4PmXs=ioY9s=t8pXG2uA@mail.gmail.com>
Hey all,

speaking of (business) use cases for Linked Data, there is a number of
them on W3C site:
http://www.w3.org/2001/sw/sweo/public/UseCases/

However I needed to present a few cases as a minimal slide deck, so
here it is -- maybe it will be helpful to someone:
http://www.slideshare.net/graphity/linked-data-success-stories
(Disclaimer: my project is mentioned in the end)

Martynas
graphity.org

On Sun, Jul 22, 2012 at 6:26 AM, Harish Kumar M. <harish@semgel.com> wrote:
> Hi,
>
> Thank you all for your observations on Semgel. I was really delighted to see
> Sebastian taking it upon himself to articulate in some detail about how
> Semgel aligns with the Linked Data vision. Much appreciated!
>
> Its also been great to see some of the interesting thoughts and pointers
> that have been shared in this thread. I would like to offer  (albeit with
> the risk of rehashing prior discussions in this group) clarifications and
> observations on a few points .
>
> - The need for LinkedData consuming apps publishing Linked data URI's
> (Kingsley's suggestion that served as a trigger for this thread!)
> - Balancing idealism(ie dogma) and pragmatism(ie market-driven) in realizing
> the vision of the Semantic web. (amplifying Bergman & Giovanni)
> - The need for robust Linked Data Usecases which can logically be shown to
> be superior to other/traditional approaches (amplifying Sebastian)
>
> -----------
> Linked-Data consuming apps should publish Linked-date URI's
>
> First off, I want to clarify that I considered Kingsley's queries and
> suggestions to be perfectly reasonable and did not perceive them in any way
> to be negative. I just happened to disagree with him about priorities. And
> if the cut and thrust of argument can lead to a discussion like this, we
> don't have much to complain about!
>
> Getting back to the point, Semgel's involvement with linked-data is a
> strategic decision - its a leap of faith. So, in no way am I trying to
> debate whether there is market of linked-data - after investing a bunch of
> time and effort, I and most of us in this group are well past that point!
>
> However, we would like our tactical decisions to be market-driven. I saw
> Kingsley's suggestion that linked-data consuming apps too should publish
> LinkedData URI's as something that should be market-driven.
>
> Somewhere in the thread, Kingsley elegantly articulated the technical
> rationale for doing this
>
> ... "the application ingests structured data but emits HTML pages (reports)
> where the actual data keys (URIs) for the data are now dislocated from the
> value chain? If you consume Linked Data there's no reason to obscure access
> to those data sources in a solution. There are a number of best practice
> patterns for keeping URIs accessible and discoverable to user agents"
>
> How could the geek in me not agree with this! However, wearing the business
> hat, I need to silence the geek and recognize that this cannot be a priority
> when we are still trying to firmly establish a basic ecosystem of
> linked-data publishing and consuming apps.
>
> Kingsley reached out to me privately (very gracious of him!) and indicated
> there is indeed a business case for Semgel to do this. I intend to engage
> with him with a open mind to better understand his point of view.
>
> ----------
> Balancing idealism and pragmatism in realizing the vision of the Semantic
> web.
>
> Semweb has always had more than its fair share of idealism and dogma
> associated with it. However, at the risk of stating the obvious, we do need
> to balance it with a appropriate amount of pragmatism. We just don't want to
> go down the path of becoming "architectural astronauts"!
> (http://bit.ly/bFnrDG)
>
> When Bergman speaks about seeing "linked data as a useful and often
> desirable technique, but not a means" and Giovanni bemoans the fact that "
> features are neglected because they do not fit with the pure original
> visions" and insists that "The community must honestly assess where semantic
> technologies don't fit and on the other hand which features of the semantic
> web  "stack" make some sense and bring value to the scenarios that have
> (bring)economic value", I could not agree more!
>
> We want to focus on the value we deliver, not on how we deliver it. A user
> of the Semgel app for instance is never made aware of its semweb roots -
> although some of them do wonder why some simple ops are sometimes so very
> slow :)
>
> Given Semgel's focus on linked-data consumption in general and UI in
> particular, we have primarily drawn our inspiration from the work done by
> the MIT/Simile folks. What makes them stand out for me is their pragmatism.
> Exhibit, Potluck, Parallax and Refine all have pioneered fundamental ideas
> without necessarily embracing the full semweb stack. This is what we would
> like to emulate
>
> We also have the brilliant sig.ma from Sindice (which does explicitly expose
> the underlying uri's) and I am very much looking forward to exploring
> Martynas's graphity (discovered through this thread!)
>
> ----------
> The need for robust Linked Data Usecases
>
> Sebastian wondered 'if we could collect even a small set of convincing
> business cases and describe what problems they are solving and how, and also
> what problems they encountered, I think it would help many of us".
>
> Again, I couldn't agree more.
>
> When we describe Semgel's architecture to geeks (who have not consumed the
> semweb koolaid!), they can't help but wonder why we have chosen to perform
> such elaborate acrobatics to build what is on the surface a relatively
> straight-forward app. Mashing up data? Why cannot that be accomplished with
> a few lines of python code, they ask!
>
> The fact is that the one usecase that semweb really shines at is when there
> is a need to
> - integrate
> - small, but
> - diverse datasets (as in schema diversity)
> - in a adhoc manner (as oppossed to pre-determined).
> - for generic analysis
>
> This will only come into play when there is a mature, global, distributed
> data landscape. The unfortunate fact is that its going to be while before we
> see this.
>
> So the big question is - what intermediate problems can we take on while we
> wait for this data paradise to emerge. I suppose this is where we need to
> act on Sebastian's suggestion and begin to catalog linked-data usecases
> which we can logically and rigorously show are superior to traditional
> approaches. (we are not looking for usecases which simply demo how the tech
> works).   We really need to show how we can excel on the Variety dimension
> of Big Data.
>
> If there are such lists, please do share them. If there are none, I would
> love to collaborate with some of you to put something together.
>
> Thanks,
> Harish
> http://semgel.com
>
> ps : did any of you really read this response in its entirety? :)
>
> On Sat, Jul 21, 2012 at 9:51 PM, Giovanni Tummarello
> <giovanni.tummarello@deri.org> wrote:
>>
>> In the past months i have worked a lot on the commercialization of RDF
>> basedknowledge technologies so i feel like giving a contribution.
>>
>> We tried to understand what could be of interest to enterprise and
>> came up with the slogan - or lets say adopted -  "enterprise linked
>> data clouds" with an internally matured understanding of what this
>> means and how it deliver value.
>>
>> In our experience, Linked Data that can be of interest to enterprise
>> cannot be further away from so many of the things that have been
>> preached and pushed with prominence (i'll mention a few things like
>> 303s,  "follow your nose" even  "resolvable data uris",  "sameAs" , "5
>> star data publishing" , vocabolary x y that was never used outside
>> demos... insert here so much more ).
>>
>> Similary is very far away from saying 'replace your existing running
>> system with anything RDF based'. Wont even speak about preaching the
>> value of publishin data as "lod".
>>
>> To find value that can be sold i'd go back to the basic a bit.
>>
>>  RDF is very nice at Knowledge Representation.  Matter of fact might
>> be the most solid industrial tool there is for this. Great way to
>> serialize knowledge with properties attached to the data, great way to
>> merge, great way to ship it to others (and hope they'll understand it)
>> thanks to shared URIs of properties.  A mature query language.
>>
>> Ok so where does this come into use SPECIFICALLY? (that is you can
>> demonstrate superiority vs other existing technologies)
>>
>> I'd say only in environments/use cases/ business sectors  where
>>
>> * knowledge can come from many sources, AND
>> * new sources popping up all the time,  AND
>> *  sources which are complex, might have a lot of rich descriptions,
>> * time to explore and understand them is limited,
>> * AND of course sufficient SCALE of the operation/business to support
>> the development/ have time to learn and understand this etc.
>>
>> The first sectors that come to mind with these needs are (at least
>> come to mind to me) pharmaceutical, defense-military, scientific
>> technical publishing.  (they're the first that come to mind given that
>> in my ownlittle personal experience these are the sector that 'came to
>> us' and really didnt need pitching or just minimal)
>>
>> One can say that, looking well, a lot of others, potentially, in the
>> future might have similar need.
>>
>> True.. but they might when you put another elements into this: data
>> scale (bigdata)  and robustness AND (given the last point of the
>> previous list which is) enterprise strenght credibility.
>>
>> Here we as a community, IMO have not been shining:.
>>
>> * bigdata - just not there. Sorry but "publishing" a big data set as
>> in LOD doesnt count as a difficult data operation to do. Semantic
>> technologies have notoriously been proposed by "academics" with very
>> often not even the slightest notion of what traditional data
>> processing systems do, even a basic RDBMS. Get the names of the
>> peoplewho have published and have been incensed on semantic web and
>> intersect that with that of conferences that matter to industry (and
>> the world)
>>
>> * robustness - all systems have been shaky at best again due to being
>> too often just trow away prototypes (when coming from academia). In
>> other cases companies venturing into this field have been way too much
>> distracted/ pressured/ (and finally got self convinced) into
>> implementing and caring about features (see all those mentioned above
>> and more)  that were unrequested to begin with, and which value was
>> just based on a conjecture.
>>
>> * missing obvious features. Other features were neglected becouse "not
>> fitting with the pure originalvisions" why restricting ourself to
>> triples? quads or quintuples for example make so much sense but oh my
>> god what would the community have said. And now systems that have
>> these features e.g. certain graph sstores are the obvious choices in
>> certain cases.
>>
>> Somebody mentioned "Garlik" as a success story earlier. They got this
>> right, but by concentrating on thigs that made sense for industry
>> (their industry) with minimal features that were needed (their 5store
>> - the production large scale data processing triplestore really
>> implements just a bare subsset of sparql, they reason only with some
>> simple rules etc) but done with proper engineering.
>>
>> So my conclusion in short.
>>
>> There are, in our opinion and analysis,  reasons why semantic data
>> technologies/ large scale knowledge representation have a lot to give
>> to society. However to have credibility have some result, the
>> "community" must get humble , look at what's happening in the real
>> world of data integration and big data.
>> The community must honestly assess where semantic technologies don't
>> fit and on the other hand which features of the semantic web  "stack"
>> make some sense and bring value to the scenarios that have (bring)
>> economic value)
>>
>> Gio
>>
>>
>>
>>
>> On Sat, Jul 21, 2012 at 1:05 AM, Sebastian Schaffert
>> <sebastian.schaffert@salzburgresearch.at> wrote:
>> > Hi Dave,
>> >
>> > comments inline. :)
>> >
>> > Am 20.07.2012 um 23:25 schrieb Dave Reynolds:
>> >
>> >> Hi Sebastian,
>> >>
>> >> I completely agree with what you say about:
>> >>  o Harish's original post being relevant to linked data and this list
>> >>  o that the culture of this forum can be counter productive
>> >>  o that the evidence for linked data delivering business value needs
>> >>    to be a lot stronger
>> >>
>> >> However, just to balance the picture slightly ...
>> >>
>> >> There are *some* clear, well documented examples of semweb/RDF/LD
>> >> delivering business value through data integration. The most famous of these
>> >> being probably: Garlik (now Experian), Amdocs and arguably the BBC. In my
>> >> experience for every publicised example there are several non-public or at
>> >> least less visible examples of companies quietly using the technology
>> >> internally while not shouting about it. I've come across examples in
>> >> banking, publishing, travel and health care - at different levels of
>> >> maturity.
>> >
>> > Yes, for me these are all great results. However, the problem for me is
>> > convincing other industries, and the toughest question I am always faced
>> > with is "and why could I not solve the issue with established technology
>> > XYZ, which my engineers already know?". As long as we cannot answer this
>> > question, it will not be easy.
>> >
>> >
>> >>
>> >> Not saying the business value story is perfectly articulated or the
>> >> evidence is watertight, but it's not totally absent :)
>> >>
>> >> While it's not your main point, I would also say we have reasonable
>> >> arguments for the value of linked data over just CSVs for publishing
>> >> government statistics and measurement data. The benefits include safer use
>> >> of data because it's self-describing (e.g. units!), ability to slice and
>> >> dice through API calls making it easier to build apps, ability to address
>> >> the data and thus annotate it and reference it. The more advanced government
>> >> departments approach this as "publish once, use many". One pipeline that
>> >> lets people access the data as dumps, through REST APIs, as Linked Data or
>> >> via apps - all powered by a shared Linked Data infra-structure. It's not CSV
>> >> or Linked Data it's CSV *and* Linked Data.
>> >
>> > Yes. It was actually not really an argument from my side, I just wanted
>> > to point out the kind of discussions I face with people out there. I totally
>> > agree with what you say.
>> >
>> > Greetings,
>> >
>> > Sebastian
>> > --
>> > | Dr. Sebastian Schaffert
>> > sebastian.schaffert@salzburgresearch.at
>> > | Salzburg Research Forschungsgesellschaft
>> > http://www.salzburgresearch.at
>> > | Head of Knowledge and Media Technologies Group          +43 662 2288
>> > 423
>> > | Jakob-Haringer Strasse 5/II
>> > | A-5020 Salzburg
>> >
>>
>
>
>
Received on Sunday, 22 July 2012 09:46:26 UTC