Semantic Web pneumonia and the Linked Data flu (was: Can we lower the LD entry cost please (part 1)?) from Yves Raimond on 2009-02-09 (public-lod@w3.org from February 2009)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Mon, 9 Feb 2009 10:40:08 +0000
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <82593ac00902090240m3199ed7bw10d07c5513db3095@mail.gmail.com>

Hello!

Just to jump on the last thread, something has been bugging me lately.
Please don't take the following as a rant against technologies such as
voiD, Semantic Sitemaps, etc., these are extremely useful piece of
technologies - my rant is more about the order of our priorities, and
about the growing cost (and I insist on the word "growing") of
publishing linked data.

There's a lot of things the community asks linked data publisher to do
(semantic sitemaps, stats on the dataset homepages, example sparql
queries, void description, and now search function), and I really tend
to think this makes linked data publishing cost much, much more
costly. Richard just mentioned that it should just take 5 minutes to
write such a search function, but 5 minutes + 5 minutes + 5 minutes +
... takes a long time. Maintaining a linked dataset is already *lots*
of work: server maintenance, dataset maintenance, minting of new
links, keeping up-to-date with the data sources, it *really* takes a
lot of time to do properly.
Honestly, I begin to be quite frustrated, as a publisher of about 10
medium-size-ish datasets. I really have the feeling the work I
invested in them is never enough, every time there seems to be
something missing to make all these datasets a "real" part of the
linked data cloud.

Now for the most tedious part of my rant :-) Most of the datasets
published in the linked data world atm are using open source
technologies (easy enough to send a patch over to the data publisher).
Some of them provide SPARQL end points. What's missing for the
advocate of new technologies or requirements to fulfill their goal
themselves? After all, that's what we all did with this project since
the beginning! If someone really wants a smallish search engine on top
of some dataset, wrapping a SPARQL query, or a call to the web service
that the dataset wraps should be enough. I don't see how the data
publisher is required for achieving that aim. The same thing holds for
voiD and other technologies. Detailed statistics are available on most
dataset homepages, which (I think) provides enough data to write a
good enough voiD description.

To sum up, I am just increasingly concerned that we are building
requirements on top of requirements for the sake of lowering a  "LD
entry cost", whereas I have the feeling that this cost is really
higher and higher... And all that doesn't make the data more linked
:-)

Cheers!
y

Received on Monday, 9 February 2009 10:40:49 UTC