- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Sat, 6 Dec 2008 08:02:17 +0000
- To: "Juan Sequeda" <juanfederico@gmail.com>
- Cc: "public-lod@w3.org" <public-lod@w3.org>
> - My company has recently released an API for access to structured > (database) data about 55 million companies and 35 million people. Do > you think I should release this in an LOD format? How would my > customers benefit. could be tricky usually one such api involves looking up and finind details about records. This is not how LOD per se works or is concerned with, LOD is about knowing the identifier in advance and accessing it. A search engine might index such identifiers and give the lookup capabilities. E.g. if you go on Sindice.com you can find the LOD entry points for items e.g. "Berlin" etc. In general getting details in RDF as per LOD access is a handy way to process them, but up to a certain point, e.g. you wouldnt easily be able to get an ordered list in RDF (nobody really knows how to handle that, modelling becomes tedious, SPARQL breaks down or query complexity blows up.. ). Advantages however come if the customers plan to integrate your results with other also providing it in RDF and possibly lod. If you provide links to other datasources then the customer can easily follow those links. You also provide disambiguation, your linking to another site, e.g. DBpedia, grounds your meaning to a specific entity so you beat ambiguity and the customer can be more sure of the answer. Again, if RDF is liked by your customer, he/she will find acessing a resolvable URI easier than writing a sparql query (thought not necessarely so much so) In doing so you're however taking responsibility for the other site, taking on you the task of creating such meaningful externa links and in general give credit and value to other people. Side effects of linking to other datasets also include that you get indexed also as referring to the other external identifier so e.g. if you point to Dbpedia "berlin" then if someone looks for that on Sindice it will find you (again sorry for mentioning the search engine but really do you expect dbpedia and all the sites to simply add links to all the providers that show up? ). Again this is advantages for you probably not necessarely for your customers. > > - Can you give a use case for mixing LOD with privately supplied data > (from my companies own data sources or from user-generated content) to > produce a useful application? A nice example. comes from the latest opencalais post on joining LOD. they say.. you send in a piece of news, we RDF it and prvide you links to other sites (i guess dbpedia mostly) you follow these links and get more data about the entities so you can do something automatic with it. e.g. i tell you "citybank offices in london" you look citybank up in dbpedia, find out it is a financial company you look up london you find out its in the UK so you could trigger some automatic mail for news on "UK financials" On the other hand, this specific use case and most i can think of however could be solved with a direct integration with Freebase, for example, or with custom scripting against APIs . In genral most of them leverage dbpedia which is a fantastic resource per se, this however irregardless of it being offered as LOD. > > - What commercial applications are there that use LOD? > I know some that use dbpedia dataset, dont know about lod > - What are some of the major limitations of today's software that > would be improved upon by using LOD? > again i see a lot of potential for "integrating with large clean structured datasets to provide a bit of intelligence to your application" however again i am not sure how to use lod with this say you go to dbpedia.org again then what do you do ? do you blindly follow all the seeAlso and hope to find.. something else? what is that something else, how can you be sure its the information that you needed for your use case (E.g. geolocation, product priecing or reviews) so at implementation level the idea is that people use lod with some sort of "smart agent" which in theory goes around and explores until it finds interesting stuff. so far there has been no even remote example of one such agent (the complexity would be big, the performance would be abismal) so what can be done in practice is instead hardcode the query that is tell your software.. "go to dbpedia, then also fetch geonames , then look for this and that property, thats it". Which does not necessarely buy you much as compared to direct api integration. On the plus side , receiving RDF, as opposed to an answer from a web 2.0 api might provide some extra goodies if your software is smart enough to make use of them, e.g. you might get some information you were not really asking for but that you know how to process anyway. good in that case. > > - Okay, I'd like to use LOD for a pilot of a commercial project. I'm > going to include 1 million triples. What production-environment > resources will I need to set up. What will my architecture include? > Will there just be a giant RDF file or a big set of them? Will they > just be front-ended by a web server? Will a database be needed? > This side of the story i think is well covered now. Technically Virtuoso or others are definitely up to the job. Its not a giant RDF dump you need to create, its a thin layer on your existing database or whatever.you have already. Can become a triplestore under certain conditions for added benefits. Short answer: once you want to do it you can do it well. > - Can I build a proprietary closed source application that > incorporates LOD? How would I combine free and fee-based data? I know > how to do it with an API. How would I do it with linked data? > feel free to use existing LOD data. To provide pay per use LOD data yourself is something that has not been explored AFAIK but i dont think it would work that well. In general LOD resources are URIs and they lose their value if they cannot be freely looked up. --------------- These things said. two clarifications again *There are great reasons for using RDF (expecially embedded as RDFa) and RDF databases for smart applications so the fact that LOD itself might have the above issues and uncertainties should not detract from considering this technology and offeringones data in this format anyway. * on the other hand some of the weak points of LOD in the open could become better inside intranets for etherogeneous db integration. i have not really deeply thought of it but i get this feeling, has someone explored this? Giovanni
Received on Saturday, 6 December 2008 08:02:53 UTC