- From: Giovanni Tummarello <g.tummarello@gmail.com>
- Date: Mon, 18 Aug 2014 12:52:05 +0200
- To: Christian Bizer <chris@bizer.de>
- Cc: Linking Open Data <public-lod@w3.org>
- Message-ID: <CAHHRs7jMpcDt-gf+L-TgDcKK2rqByvXXYkQpZeyc=b_xUKUDvg@mail.gmail.com>
Hi Chris, this is interesting, and its great you're looking also at the world of marked up data. my 2c shortly > If you build an application that requires > DBpedia/YAGO/Freebase/UMBEL/Cyc-style general knowledge about entities, or > you build an applications that requires geographic, live science, or > linguistic data, the datasets can be quite useful for you and the fact that > they are partly interlinked can save you quite some work as you need to > invest less effort into integrating them yourself. > > sure, under the assumption that the interlinks (which are provided as best effort by the producers) are of reasonable enough quality for your application. As we know quality might be strongly related to application e.g. in certain applications you might need more precisions, in certain others recall etc. certainly what is there provides a starting point however, courtesy again of the best efforts of those few. It is to be asked how linked data (dereferenciable uris etc) really helps in fostering the quality of such interlinkage e.g. do people really have mechanisms in place that resolve such uris to check the entity on the other end or do they just download the dataset, convert it to something way flatter, use disambiguation/interlining processes and then publish back? ... but in fairness, the fact that you can look at a single "Record" and somehow see as a human that it has a link to another dataset is per se likely to have some positive effect on the willingness of people to indeed go and do such interlinking. Personally, I think it is quite interesting to compare the deployment of > Microdata/RDFa/Microformats and Linked Data on the Web. We also > investigated the deployment of Microdata/RDFa/Microformats [1][2] and the > comparison currently looks like this: > > 1. The overall number of websites publishing > Microdata/RDFa/Microformats is three orders of magnitude larger than the > number of websites publishing Linked Data. > > 2. Topic wise, Microdata/RDFa/Microformats markup covers products, > reviews, businesses, addresses, events, people, job postings and recipes. > While Linked Data covers much more specific data from domains such as > e-government, libraries, life science, linguistics or geography. So there > is not too much overlap between the data that is published using the two > technologies. > > 3. In the context of Microdata/RDFa/Microformats, data providers do > not set links pointing at data items in other datasets. In the Linked Data > context, data providers do set such links to a certain extend. Not setting > links of course reduces the effort required for data publishers (you just > need to add some semantic markup to the PHP template that renders your > website and you are done). On the other hand without such links, using the > data within applications is much more painful. For an example on how much > effort it took to integrate some Microdata describing products from > different websites, see [3] (we needed sophisticated information extraction > techniques to generate features from the product names and descriptions and > then sophisticated identity resolution techniques to guess which > descriptions refer to the same product). > > 4. The Microdata/RDFa/Microformats are very shallow with usually > only 3 or 4 attributes used to describe an entity and most interesting > semantics only provided as free text (long product or job descriptions as > text). In contrast, the data that is published as Linked Data is often much > more structured (e-government, life science data, general-purpose KBs) and > entities are described with more attributes (having kind of well-defined > semantics) and is thus likely to enable more sophisticated applications. > > > > Looking at this comparison, I think the empirical results nicely reflect > the strengths of both technologies. Microdata/RDFa/Microformats aim at > being > This is quite interesting, but isnt this conclusion neglecting a huge fact.. how many people that professionally work with "e-government, libraries, life science, linguistics or geography" use linked data technology format vs other formats that are of relevance in that world? could it be again around 3 orders of magnitude? lets take the simples format, CSV how would your 1 2 3 4 answers be with CSV included. wouldnt we say that 2 orders ofmagnitude more datasets are published in CSV, they are also much more complete than those of microdata/microformats, definitely not less complete than those published in RDF and might or might not include identifiers that can link them to others > a simple technology for annotating webpages that puts very little effort > on webmasters in order to find wide-spread deployment (Guha made this point > rather clear in his LDOW2014 keynote [4]). > > Linked Data on the other hand is a technology for sharing the data > integration effort between data publishers and data consumers (the more > effort publishers put into setting RDF links, the easier it becomes for > data consumers to use the data). > This is the core point of what we should be "demostrating"... if "linked data" as it is.. does indeed work, or we should be "working on" e.g. improving the standard to increase adoption. So, i think we might want to compare linked data effectiveness and adoption vs other ways that in these very fields and by these very people data has been traditionally been shared and consumed It would be more relevant than comparing with microformants/microdata which as you say start with a different idea in mind. > > > Thus, it makes sense that we see Linked Data adoption within communities > that have an interest in making their data easy to use and thus are willing > to invest effort into this, like libraries, government and science (with > life science and language processing being the first communities adopting > the technologies) and social networking. > It might make sense to see but would have to be proven/compared against their previous standard as above. > > > Concerning your questions who did publish the datasets in the cloud (the > data producers themselves or some third parties like interested hackers and > other data enthusiasts), we did not investigate this in detail and I would > be very happy if somebody else would do this. But my general feeling is > that compared to 2011 more datasets are published by the actual data > producers or parties close to them (for instance in the domains of > e-government, libraries, or cross-domain knowledge bases). > > sure, there certainly was was some amount of succesfull selling expecially in public bodies. Good thing or red harring..? > > > This are my two cents to the overall discussion and I would be very happy > to hear what others think about the message that can be drawn from the new > diagram. > > > Thanks again for the effort and your reply! Gio
Received on Monday, 18 August 2014 10:52:53 UTC