Re: Semantic Web pneumonia and the Linked Data flu (was: Can we lower the LD entry cost please (part 1)?) from Semantics-ProjectParadigm on 2009-02-09 (public-lod@w3.org from February 2009)

From: Semantics-ProjectParadigm <metadataportals@yahoo.com>
Date: Mon, 9 Feb 2009 08:33:05 -0800 (PST)
To: "public-lod@w3.org" <public-lod@w3.org>, Yves Raimond <yves.raimond@gmail.com>, semantic-web <semantic-web@w3.org>
Message-ID: <337640.94364.qm@web45516.mail.sp1.yahoo.com>
Dear Yves,

This issue has every potential user of semantic web technologies and more generally linked data (LD, for the purists amongst us) frustrated.

The problem in this discussion and thread - and I have read all the ensuing posts to this original post- is that one thing is missing.

Are we all talking about the same thing? And how do we measure costs?

I like to break down everything to models, being the mathematician I am, and we need to have some generalized classes of end-user systems we want to look at.

Then we need a model to define the layers of informational structure, the layers of code, programs, applications etc., the resources (e.g. storage, processing, networking, distribution), etc., in which we incorporate growth of complexity and scalability

Once these exist we need to develop basic costing metrics, which allow for simulation and scalability and increasing complexity.

Next we need to look at existing business models.

I personally found : http://digitalenterprise.org/models/models.html quite refreshing to start out with.

Somehow we need to define - since we have to start somewhere, and as we are all unanimously in agreement on the fact that the scientific community will be a principal user of semantic web technologies - something like a "Science Information Networking" concept, to be able to model how linked data sets are to be managed, created, updated and the maintenance of cost of clouds.

Then we need to look at cloud computing, why?

Because for one, the cost of storage has flattened and has gone to practically none existent.

Cloud computing is convenient, because the provider of either software, platform, storage or infrastructure as a service will have all the advantages of centralized control, end the user will not have the hassle of all in-house platform and applications costs.

See: http://en.wikipedia.org/wiki/Cloud_computing for the general idea.

Then we need to focus on three more things, the web interface, and that is why I have been pounding the semantic web enabled Content Management Systems drum so consistently in the SW and LOD lists of the w3, the central storage and publishing strategies.

For the CMSs we have concluded that something needs to be done.

For the storage and cloud computing I have found that e.g. Sun MicroSystems and Sybase are willing to listen to all ideas about open source reference architectures and cloud computing, including for the semantic web.

See http://www.sun.com/service/refarch/index.html

And finally we need to create some corporate strategies for publishing, for the application, platform, storage, infrastructure, services and the linked data sets.

This will include factoring in open licenses, open source software, open access to digital repositories, intellectual property issues arising from aggregation in linked data sets.

I think that with the right classes of models for "scientific information networking", cloud computing implementation, corporate strategies for publishing and the right business models we can accommodate a fairly wide range of clouds.

But the ideas and models need to be developed, or if they are already out there somewhere, they need to be made available in some central place on the net.

Let's get out of this Babylonian confusion of tongues and start building the tower (from the base of the deep web towards the top, at the surface of the mainstream web).

Milton Ponson
GSM: +297 747 8280
Rainbow Warriors Core Foundation
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
www.rainbowwarriors.net
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
www.projectparadigm.info
NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
www.ngo-opensource.org
MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
www.metaportal.info
SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project
www.semanticwebsoftware.info


--- On Mon, 2/9/09, Yves Raimond <yves.raimond@gmail.com> wrote:
From: Yves Raimond <yves.raimond@gmail.com>
Subject: Semantic Web pneumonia and the Linked Data flu (was: Can we lower the   LD entry cost please (part 1)?)
To: "public-lod@w3.org" <public-lod@w3.org>
Date: Monday, February 9, 2009, 10:40 AM

Hello!

Just to jump on the last thread, something has been bugging me lately.
Please don't take the following as a rant against technologies such as
voiD, Semantic Sitemaps, etc., these are extremely useful piece of
technologies - my rant is more about the order of our priorities, and
about the growing cost (and I insist on the word "growing") of
publishing linked data.

There's a lot of things the community asks linked data publisher to do
(semantic sitemaps, stats on the dataset homepages, example sparql
queries, void description, and now search function), and I really tend
to think this makes linked data publishing cost much, much more
costly. Richard just mentioned that it should just take 5 minutes to
write such a search function, but 5 minutes + 5 minutes + 5 minutes +
.... takes a long time. Maintaining a linked dataset is already *lots*
of work: server maintenance, dataset maintenance, minting of new
links, keeping up-to-date with the data sources, it *really* takes a
lot of time to do properly.
Honestly, I begin to be quite frustrated, as a publisher of about 10
medium-size-ish datasets. I really have the feeling the work I
invested in them is never enough, every time there seems to be
something missing to make all these datasets a "real" part of the
linked data cloud.

Now for the most tedious part of my rant :-) Most of the datasets
published in the linked data world atm are using open source
technologies (easy enough to send a patch over to the data publisher).
Some of them provide SPARQL end points. What's missing for the
advocate of new technologies or requirements to fulfill their goal
themselves? After all, that's what we all did with this project since
the beginning! If someone really wants a smallish search engine on top
of some dataset, wrapping a SPARQL query, or a call to the web service
that the dataset wraps should be enough. I don't see how the data
publisher is required for achieving that aim. The same thing holds for
voiD and other technologies. Detailed statistics are available on most
dataset homepages, which (I think) provides enough data to write a
good enough voiD description.

To sum up, I am just increasingly concerned that we are building
requirements on top of requirements for the sake of lowering a  "LD
entry cost", whereas I have the feeling that this cost is really
higher and higher... And all that doesn't make the data more linked
:-)

Cheers!
y
Received on Monday, 9 February 2009 16:33:53 UTC