Re: Exercise: LOD questions (R)+was ( Do we need another list(s)? ) from Giovanni Tummarello on 2008-12-06 (public-lod@w3.org from December 2008)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sat, 6 Dec 2008 08:02:17 +0000
To: "Juan Sequeda" <juanfederico@gmail.com>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <210271540812060002x6f7ac570x56a732acb76175f4@mail.gmail.com>
> - My company has recently released an API for access to structured
> (database) data about 55 million companies and 35 million people. Do
> you think I should release this in an LOD format? How would my
> customers benefit.

could be tricky

usually one such api involves looking up and finind details about
records. This is not how  LOD per se works or is concerned with, LOD
is about knowing the identifier in advance and accessing it.

A search engine might index such identifiers and give the lookup
capabilities. E.g. if you go on Sindice.com you can find the LOD entry
points for items e.g. "Berlin" etc.

In general getting details in RDF as per LOD access is a handy way to
process them, but up to a certain point, e.g. you wouldnt easily be
able to get an ordered list in RDF (nobody really knows how to handle
that, modelling becomes tedious, SPARQL breaks down or query
complexity blows up.. ).

Advantages however come if the customers plan to integrate your
results with other also providing it in RDF and possibly lod. If you
provide links to other datasources then the customer can easily follow
those links. You also provide disambiguation, your linking to another
site, e.g. DBpedia, grounds your meaning to a specific entity so you
beat ambiguity and the customer can be more sure of the answer.
Again, if RDF is liked by your customer, he/she will find acessing a
resolvable URI easier than writing a sparql query (thought not
necessarely so much so)

In doing so you're however taking responsibility for the other site,
taking on you the task of creating such meaningful externa links and
in general give credit and value to other people.

Side effects of linking to other datasets also include that you get
indexed also as referring to the other external identifier so e.g. if
you point to Dbpedia "berlin" then if someone looks for that on
Sindice it will find you (again sorry for mentioning the search engine
but really do you expect dbpedia and all the sites to simply add links
to all the providers that show up? ). Again this is advantages for you
probably not necessarely for your customers.

>
> - Can you give a use case for mixing LOD with privately supplied data
> (from my companies own data sources or from user-generated content) to
> produce a useful application?

A nice example. comes from the latest opencalais post on joining LOD.
they say.. you send in a piece of news, we RDF it and prvide you links
to other sites (i guess dbpedia mostly)
you follow these links and get more data about the entities so you can
do something automatic with it.

e.g. i tell you "citybank offices in london" you look citybank up in
dbpedia, find out it is a financial company you look up london you
find out its in the UK so you could trigger some automatic mail for
news on "UK financials"

On the other hand, this specific use case and most i can think of
however could be solved with a direct integration with Freebase, for
example, or with custom scripting against APIs . In genral most of
them leverage dbpedia which is a fantastic resource per se, this
however irregardless of it being offered as LOD.

>
> - What commercial applications are there that use LOD?
>

I know some that use dbpedia dataset, dont know about lod

> - What are some of the major limitations of today's software that
> would be improved upon by using LOD?
>

again i see a lot of potential for "integrating with large clean
structured datasets to provide a bit of intelligence to your
application"

however again i am not sure how to use lod with this
say you go to dbpedia.org again then what do you do ? do you blindly
follow all the seeAlso and hope to find.. something else? what is that
something else, how can you be sure its the information that you
needed for your use case (E.g. geolocation, product priecing or
reviews)

so at implementation level the idea is that people use lod with some
sort of "smart agent" which in theory goes around and explores until
it finds interesting stuff.

so far there has been no even remote example of one such agent (the
complexity would be big, the performance would be abismal)

so what can be done in practice is instead hardcode the query that is
tell your software.. "go to dbpedia, then also fetch geonames , then
look for this and that property, thats it". Which does not necessarely
buy you much as compared to direct api integration.

On the plus side , receiving RDF, as opposed to an answer from a web
2.0 api might provide some extra goodies if your software is smart
enough to make use of them, e.g. you might get some information you
were not really asking for but that you know how to process anyway.
good in that case.

>
> - Okay, I'd like to use LOD for a pilot of a commercial project. I'm
> going to include 1 million triples. What production-environment
> resources will I need to set up. What will my architecture include?
> Will there just be a giant RDF file or a big set of them? Will they
> just be front-ended by a web server? Will a database be needed?
>

This side of the story i think is well covered now. Technically
Virtuoso or others are definitely up to the job.
Its not a giant RDF dump you need to create, its a thin layer on your
existing database or whatever.you have already. Can become a
triplestore under certain conditions for added benefits.

Short answer: once you want to do it you can do it well.

> - Can I build a proprietary closed source application that
> incorporates LOD? How would I combine free and fee-based data? I know
> how to do it with an API. How would I do it with linked data?
>

feel free to use existing LOD data. To provide pay per use LOD data
yourself is something that has not been explored AFAIK but i dont
think it would work that well. In general LOD resources are URIs and
they lose their value if they cannot be freely looked up.

---------------

These things said. two clarifications again

*There are great reasons for using RDF (expecially embedded as RDFa)
and RDF databases for smart applications so the fact that LOD itself
might have the above issues and uncertainties should not detract from
considering this technology and offeringones data in this format
anyway.

* on the other hand  some of the weak points of LOD in the open could
become better inside intranets for etherogeneous db integration. i
have not really deeply thought of it but i get this feeling, has
someone explored this?

Giovanni
Received on Saturday, 6 December 2008 08:02:53 UTC